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GENES FOR THE BIOSYNTHESIS OF EPOTHILONES 

CROSS REFERENCE TO RELATED APPLICATIONS 



This application claims the benefit of U.S. Provisional Application No. , file 

18 June 1998 [Schupp et al.; Attorney Docket No. CGC2012/PROV]; U.S. Provisional 
Application No. 60/101,631, filed 24 September 1998; and U.S. Provisional Application No. 
60/1 18,906, filed 5 February 1999. The full disclosure of each of these provisional 
applications is incorporated herein by reference. 



FIELD OF THE INVENTION 



The present invention relates generally to polyketides and genes for their synthesis. 
In particular, the present invention relates to the isolation and characterization of novel poly- 
ketide synthase and nonribosomal peptide synthetase genes from Sorangium cellulosum v 
that are necessary for the biosynthesis of epothilones A and B. 



BACKGROUND OF THE INVENTION 



Polyketides are compounds synthesized from two-carbon building blocks, the p- 
carbon of which always carries a keto group, thus the name polyketide. These compounds 
include many important antibiotics, immunosuppressants, cancer chemotherapeutic agents, 
and other compounds possessing a broad range of biological properties. The tremendous 
structural diversity derives from the different lengths of the polyketide chain, the different 
side-chains introduced (either as part of the two-carbon building blocks or after the poly- 
ketide backbone is formed), and the stereochemistry of such groups. The keto groups may 
also be reduced to hydroxyls, enoyls, or removed altogether. Each round of two-carbon 
addition is carried out by a complex of enzymes called the polyketide synthase (PKS) in a 
manner similar to fatty acid biosynthesis. 

The biosynthetic genes for an increasing number of polyketides have been isolated 
and sequenced. For example, see U.S. Patent Nos. 5,639,949, 5,693,774, and 5,716,849, 
all of which are incorporated herein by reference, which describe genes for the biosynthesis 
of soraphen. See also, Schupp ef al., FEMS Microbiology Letters 159: 201-207 (1998) and 
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WO 98/07868, which describe genes for the biosynthesis of rifamycin, and U.S. Patent No. 
5,876,991, which describes genes for the biosynthesis of tylactone, all of which are incorpo- 
rated herein by reference. The encoded proteins generally fall into two types: type I and 
type II. Type I proteins are polyfunction^, with several catalytic domains carrying out diffe- 
rent enzymatic steps covalently linked together (e.g. PKS for erythromycin, soraphen, rifa- 
mycin, and avermectin (MacNeil et aL, in Industrial Microorganisms: Basic and Applied Mo- 
lecular Genetics, (ed.: Baltz et a/.), American Society for Microbiology, Washington D. C. pp. 
245-256 (1993)); whereas type II proteins are monofunctional (Hutchinson et aL, in Industrial 
Microorganisms: Basic and Applied Molecular Genetics, (ed.: Baltz et aL), American Society 
for Microbiology, Washington D. C. pp. 203-216 (1993)). 

For the simpler polyketides such as actinorhodin (produced by Streptomyces 
coelicolor), the several rounds of two-carbon additions are carried out iteratively on PKS 
enzymes encoded by one set of PKS genes. In contrast, synthesis of the more complicated 
compounds such as erythromycin and soraphen involves PKS enzymes that are organized 
into modules, whereby each module carries out one round of two-carbon addition (for review, 
see Hopwood et aL, in Industrial Microorganisms: Basic and Applied Molecular Genetics, 
(ed.: Baltz et aL), American Society for Microbiology, Washington D. C, pp. 267-275 
(1993)). 

Complex polyketides and secondary metabolites in general may contain substructu- 
res that are derived from amino acids instead of simple carboxylic acids. Incorporations of 
these building blocks are accomplished by non-ribosomal polypeptide synthetases (NRPSs), 
NRPSs are multienzymes that are organized in modules. Each module is responsible for the 
addition (and the additional processing, if required) of one amino acid building block. 
NRPSs activate amino acids by forming aminoacyl-adenylates, and capture the activated 
amino acids on thiol groups of phophopantheteinyl prosthetic groups on peptidyl carrier 
protein domains. Further, NRPSs modify the amino acids by epimerization, N-methylation, 
or cyclization if necessary, and catalyse the formation of peptide bonds between the 
enzyme-bound amino acids. NRPSs are responsible for the biosynthesis of peptide secon- 
dary metabolites like cyclosporin, could provide polyketide chain terminator units as in rapa- 
mycin, or form mixed systems with PKSs as in yersiniabactin biosynthesis. 

Epothilones A and B are 16-membered macrocyclic polyketides with an acylcysteine- 
derived starter unit that are produced by the bacterium Sorangium cellulosum strain So ce90 
(Gerth et aL, J. Antibiotics 49: 560-563 (1996), incorporated herein by reference). The 
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structure of epothilone A and B wherein R signifies hydrogen (epothilone A) or methyl (epo- 
thilone B) is: 



The epothilones have a narrow antifungal spectrum and especially show a high 
cytotoxicity in animal cell cultures (see, Hofle et al. t Patent DE 4138042 (1993), incorporated 
herein by reference). Of significant importance, epothilones mimic the biological effects of 
taxol, both in vivo and in cultured cells (Bollag etal., Cancer Research 55: 2325-2333 
(1995), incorporated herein by reference). Taxol and taxotere, which stabilize cellular ; 
microtubules, are cancer chemotherapeutic agents with significant activity against various 
human solid tumors (Rowinsky et ai, J. Natl. Cancer Inst 83: 1778-1781 (1991)). Competi- 
tion studies have revealed that epothilones act as competitive inhibitors of taxol binding to 
microtubules, consistent with the interpretation that they share the same microtubule-binding 
site and possess a similar microtubule affinity as taxol. However, epothilones enjoy a 
significant advantage over taxol in that epothilones exhibit a much lower drop in potency 
compared to taxol against a multiple drug-resistant cell line (Bollag etal (1995)). Further- 
more, epothilones are considerably less efficiently exported from the cells by P-glycoprotein 
than is taxol (Gerth etal. (1996)). In addition, several epothilone analogs have been syn- 
thesized that have a superior cytotoxic activity as compared to epothilone A or epothilone B 
as demonstrated by their enhanced ability to induce the polymerization and stabilization of 
microtubules (WO 98/25929, incorporated herein by reference). 

Despite the promise shown by the epothilones as anticancer agents, problems per- 
taining to the production of these compounds presently limit their commercial potential. The 
compounds are too complex for industrial-scale chemical synthesis and so must be produ- 
ced by fermentation. Techniques for the genetic manipulation of myxobacteria such as 
Sorangium celiulosum are described in U.S. Patent No. 5,686,295, incorporated herein by 
reference. However, Sorangium celiulosum is notoriously difficult to ferment and production 
levels of epothilones are therefore low. Recombinant production of epothilones in hetero- 
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logous hosts that are more amenable to fermentation could solve current production pro- 
blems. However, the genes that encode the polypeptides responsible for epothilone bio- 
synthesis have heretofore not been isolated. Furthermore, the strain that produces epo- 
thilones, i.e. So ce90, also produces at least one additional polyketide, spirangien, which 
would be expected to greatly complicate the isolation of the genes particularly responsible 
for epothilone biosynthesis. 

Therefore, in view of the foregoing, one object of the present invention is to isolate 
the genes that are involved in the synthesis of epothilones, particularly the genes that are 
involved in the synthesis of epothilones A and B in myxobacteria of the Sorangium/- 
Polyangium group, i.e., Sorangium cellulosum strain So ce90. A further object of the 
invention is to provide a method for the recombinant production of epothilones for application 
in anticancer formulations. 

SUMMARY OF THE INVENTION 

In furtherance of the aforementioned and other objects, the present invention unex- 
pectedly overcomes the difficulties set forth above to provide for the first time a nucleic acid 
molecule comprising a nucleotide sequence that encodes at least one polypeptide involved 
in the biosynthesis of epothilone. In a preferred embodiment, the nucleotide sequence is 
isolated from a species belonging to Myxobacteria, most preferably Sorangium cellulosum. 

In another preferred embodiment, the present invention provides an isolated nucleic 
acid molecule comprising a nucleotide sequence that encodes at least one polypeptide in- 
volved in the biosynthesis of an epothilone, wherein said polypeptide comprises an amino 
acid sequence substantially similar to an amino acid sequence selected front the group con- 
sisting of: SEQ ID NO:2, amino acids 1 1-437 of SEQ ID NO:2, amino acids 543-864 of SEQ 
ID NO:2, amino acids 974-1273 of SEQ ID NO:2, amino acids 1314-1385 of SEQ ID NO:2, 
SEQ ID NO:3, amino acids 72-81 of SEQ ID NO:3, amino acids 118-125 of SEQ ID NO:3, 
amino acids 199-212 of SEQ ID NO:3, amino acids 353-363 of SEQ ID NO:3, amino acids 
549-565 of SEQ ID NO:3, amino acids 588-603 of SEQ ID NO:3, amino acids 669-684 of 
SEQ ID NO:3, amino acids 815-821 of SEQ ID NO:3, amino acids 868-892 of SEQ ID NO:3, 
amino acids 903-912 of SEQ ID NO:3, amino acids 918-940 of SEQ ID NO:3, amino acids 
1268-1274 of SEQ ID NO:3, amino acids 1285-1297 of SEQ ID NO:3, amino acids 973-1256 
of SEQ ID NO:3, amino acids 1344-1351 of SEQ ID NO:3, SEQ ID NO:4, amino acids 7-432 
of SEQ ID NO:4, amino acids 539-859 of SEQ ID NO:4, amino acids 869-1037 of SEQ ID 
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N0:4, amino acids 1439-1684 of SEQ ID NO:4, amino acids 1722-1792 of SEQ ID NO:4, 
SEQ ID NO:5, amino acids 39-457 of SEQ ID NO:5, amino acids 563-884 of SEQ ID NO:5, 
amino acids 1147-1399 of SEQ ID NO:5, amino acids 1434-1506 of SEQ ID NO:5, amino 
acids 1524-1950 of SEQ ID NO:5, amino acids 2056-2377 of SEQ ID NO:5, amino acids 
2645-2895 of SEQ ID NO:5, amino acids 2932-3005 of SEQ ID NO:5, amino acids 3024- 
3449 of SEQ ID NO:5, amino acids 3555-3876 of SEQ ID NO:5, amino acids 3886-4048 of 
SEQ ID NO:5, amino acids 4433-4719 of SEQ ID NO:5 f amino acids 4729-4974 of SEQ ID 
NO:5, amino acids 5010-5082 of SEQ ID NO:5, amino acids 5103-5525 of SEQ ID NO:5, 
amino acids 5631-5951 of SEQ ID NO:5, amino acids 5964-6132 of SEQ ID NO:5, amino 
acids 6542-6837 of SEQ ID NO:5, amino acids 6857-7101 of SEQ ID NO:5, amino acids 
7140-7211 of SEQ ID NO:5, SEQ ID NO:6, amino acids 35-454 of SEQ ID NO:6, amino 
acids 561-881 of SEQ ID NO:6, amino acids 1143-1393 of SEQ ID NO:6, amino acids 1430- 
1503 of SEQ ID NO:6, amino acids 1522-1946 of SEQ ID NO: 6, amino acids 2053-2373 of 
SEQ ID NO:6, amino acids 2383-2551 of SEQ ID NO:6, amino acids 2671-3045 of SEQ ID 
NO:6, amino acids 3392-3636 of SEQ ID NO:6, amino acids 3673-3745 of SEQ ID NO:6, ; 
SEQ ID NO:7, amino acids 32-450 of SEQ ID NO:7, amino acids 556-877 of SEQ ID NO:7, 
amino acids 887-1051 of SEQ ID NO:7, amino acids 1478-1790 of SEQ ID NO:7, amino 
acids 1810-2055 of SEQ ID NO:7, amino acids 2093-2164 of SEQ ID NO:7, amino acids 
2165-2439 of SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:11, and SEQ ID 
NO:22. 

In a more preferred embodiment, the present invention provides an isolated nucleic 
acid molecule comprising a nucleotide sequence that encodes at least one polypeptide in- 
volved in the biosynthesis of an epothilone, wherein said polypeptide comprises an amino 
acid sequence selected from the group consisting of: SEQ ID NO:2, amino acids 11-437 of 
SEQ ID NO:2, amino acids 543-864 of SEQ ID NO:2, amino acids 974-1273 of SEQ ID 
NO:2, amino acids 1314-1385 of SEQ ID NO:2, SEQ ID NO:3, amino acids 72-81 of SEQ ID 
NO:3, amino acids 118-125 of SEQ ID NO:3, amino acids 199-212 of SEQ ID NO:3, amino 
acids 353-363 of SEQ ID NO:3, amino acids 549-565 of SEQ ID NO:3, amino acids 588-603 
of SEQ ID NO:3, amino acids 669-684 of SEQ ID NO:3, amino acids 815-821 of SEQ ID 
NO:3, amino acids 868-892 of SEQ ID NO:3, amino acids 903-912 of SEQ ID NO:3, amino 
acids 918-940 of SEQ ID NO:3, amino acids 1268-1274 of SEQ ID NO:3, amino acids 1285- 
1297 of SEQ ID NO:3, amino acids 973-1256 of SEQ ID NO:3, amino acids 1344-1351 of 
SEQ ID NO:3, SEQ ID NO:4, amino acids 7-432 of SEQ ID NO:4, amino acids 539-859 of 
SEQ ID NO:4, amino acids 869-1037 of SEQ ID NO:4, amino acids 1439-1684 of SEQ ID 
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N0:4, amino acids 1722-1792 of SEQ ID NO:4, SEQ ID NO:5, amino acids 39-457 of SEQ 
ID NO:5, amino acids 563-884 of SEQ ID NO:5, amino acids 1 147-1399 of SEQ ID NO:5, 
amino acids 1434-1506 of SEQ ID NO:5, amino acids 1524-1950 of SEQ ID NO:5, amino 
acids 2056-2377 of SEQ ID NO:5, amino acids 2645-2895 of SEQ ID NO:5, amino acids 
2932-3005 of SEQ ID NO:5, amino acids 3024-3449 of SEQ ID NO:5, amino acids 3555- 
3876 of SEQ ID NO:5, amino acids 3886-4048 of SEQ ID NO:5, amino acids 4433-4719 of 
SEQ ID NO:5, amino acids 4729-4974 of SEQ ID NO:5, amino acids 5010-5082 of SEQ ID 
NO:5, amino acids 5103-5525 of SEQ ID NO:5, amino acids 5631-5951 of SEQ ID NO:5, 
amino acids 5964-6132 of SEQ ID NO:5, amino acids 6542-6837 of SEQ ID NO:5, amino 
acids 6857-7101 of SEQ ID NO:5, amino acids 7140-7211 of SEQ ID NO:5, SEQ ID NO:6, 
amino acids 35-454 of SEQ ID NO:6, amino acids 561-881 of SEQ ID NO:6, amino acids 
1 143-1393 of SEQ ID NO:6, amino acids 1430-1503 of SEQ ID NO:6, amino acids 1522- 
1946 of SEQ ID NO: 6, amino acids 2053-2373 of SEQ ID NO:6, amino acids 2383-2551 of 
SEQ ID NO:6, amino acids 2671-3045 of SEQ ID NO:6, amino acids 3392-3636 of SEQ ID 
NO:6, amino acids 3673-3745 of SEQ ID NO:6, SEQ ID NO:7, amino acids 32-450 of SEQ ; 
ID NO:7, amino acids 556-877 of SEQ ID NO:7, amino acids 887-1051 of SEQ ID NO:7, 
amino acids 1478-1790 of SEQ ID NO:7, amino acids 1810-2055 of SEQ ID NO:7, amino 
acids 2093-2164 of SEQ ID NO:7, amino acids 2165-2439 of SEQ ID NO:7, SEQ ID NO:8, 
SEQ ID NO:10, SEQ ID NO:11, and SEQ ID NO:22. 

In yet another preferred embodiment, the present invention provides an isolated 
nucleic acid molecule comprising a nucleotide sequence that encodes at least one polypep- 
tide involved in the biosynthesis of an epothilone, wherein said nucleotide sequence is 
substantially similar to a nucleotide sequence selected from the group consisting of: the 
complement of nucleotides 1900-3171 of SEQ ID NO:1, nucleotides 3415-5556 of SEQ ID 
NO:1, nucleotides 7610-11875 of SEQ ID NO:1, nucleotides 7643-8920 of SEQ ID NO:1, 
nucleotides 9236-10201 of SEQ ID NO:1, nucleotides 10529-11428 pf SEQ ID NO:1, 
nucleotides 11549-11764 of SEQ ID NO:1, nucleotides 11872-16104 of SEQ ID NO:1, 
nucleotides 12085-12114 of SEQ ID NO:1, nucleotides 12223-12246 of SEQ ID NO:1, 
nucleotides 12466-12507 of SEQ ID NO:1, nucleotides 12928-12960 of SEQ ID NO:1, 
nucleotides 13516-13566 of SEQ ID NO:1, nucleotides 13633-13680 of SEQ ID NO:1, 
nucleotides 13876-13923 of SEQ ID NO:1, nucleotides 14313-14334 of SEQ ID NO:1, 
nucleotides 14473-14547 of SEQ ID NO:1, nucleotides 14578-14607 of SEQ ID NO:1, 
nucleotides 14623-14692 of SEQ ID NO:1, nucleotides 15673-15693 of SEQ ID NO:1, 
nucleotides 15724-15762 of SEQ ID NO:1, nucleotides 14788-15639 of SEQ ID NO:1, 
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nucleotides 15901-15924 of SEQ ID NO:1, nucleotides 16251-21749 of SEQ ID NO:1 , 
nucleotides 16269-17546 of SEQ ID NO:1, nucleotides 17865-18827 of SEQ ID NO:1, 
nucleotides 18855-19361 of SEQ ID NO:1, nucleotides 20565-21302 of SEQ ID NO:1, 
nucleotides 21414-21626 of SEQ ID NO:1, nucleotides 21746-43519 of SEQ ID NO:1, 
nucleotides 21860-23116 of SEQ ID NO:1, nucleotides 23431-24397 of SEQ ID NO:1, 
nucleotides 25184-25942 of SEQ ID NO:1, nucleotides 26045-26263 of SEQ ID NO:1, 
nucleotides 26318-27595 of SEQ ID NO:1, nucleotides 27911-28876 of SEQ ID NO:1, 
nucleotides 29678-30429 of SEQ ID NO:1, nucleotides 30539-30759 of SEQ ID NO:1, 
nucleotides 30815-32092 of SEQ ID NO:1, nucleotides 32408-33373 of SEQ ID NO:1, 
nucleotides 33401-33889 of SEQ ID NO:1, nucleotides 35042-35902 of SEQ ID NO:1, 
nucleotides 35930-36667 of SEQ ID NO:1 , nucleotides 36773-36991 of SEQ ID NO:1 , 
nucleotides 37052-38320 of SEQ ID NO:1 , nucleotides 38636-39598 of SEQ ID NO:1 , 
nucleotides 39635-40141 of SEQ ID NO:1, nucleotides 41369-42256 of SEQ ID NO:1, 
nucleotides 4231 4-43048 of SEQ ID NO:1 , nucleotides 431 63-43378 of SEQ ID NO:1 , 
nucleotides 43524-54920 of SEQ ID NO:1 , nucleotides 43626-44885 of SEQ ID NO:1 , 
nucleotides 45204-46166 of SEQ ID NO:1, nucleotides 46950-47702 of SEQ ID NO:1, 
nucleotides 4781 1-48032 of SEQ ID NO:1 , nucleotides 48087-49361 of SEQ ID NO:1 , 
nucleotides 49680-50642 of SEQ ID NO:1, nucleotides 50670-51176 of SEQ ID NO:1, 
nucleotides 51534-52657 of SEQ ID NO:1, nucleotides 53697-54431 of SEQ ID NO:1, 
nucleotides 54540-54758 of SEQ ID NO:1 , nucleotides 54935-62254 of SEQ ID NO:1 , 
nucleotides 55028-56284 of SEQ ID NO:1 , nucleotides 56600-57565 of SEQ ID NO:1 , 
nucleotides 57593-58087 of SEQ ID NO:1, nucleotides 59366-60304 of SEQ ID NO:1, 
nucleotides 60362-61099 of SEQ ID NO:1, nucleotides 61211-61426 of SEQ ID NO:1, 
nucleotides 61427-62254 of SEQ ID NO:1 , nucleotides 62369-63628 of SEQ, ID NO:1 , 
nucleotides 67334-68251 of SEQ ID NO:1, and nucleotides 1-68750 SEQ ID NO:1. 

In an especially preferred embodiment, the present invention provides a nucleic acid 
molecule comprising a nucleotide sequence that encodes at least one polypeptide involved 
in the biosynthesis of an epothilone, wherein said nucleotide sequence is selected from the 
group consisting of: the complement of nucleotides 1900-3171 of SEQ ID NO:1, nucleotides 
3415-5556 of SEQ ID NO:1, nucleotides 7610-11875 of SEQ ID NO:1, nucleotides 7643- 
8920 of SEQ ID NO:1, nucleotides 9236-10201 of SEQ ID NO:1, nucleotides 10529-11428 
of SEQ ID NO:1, nucleotides 11549-11764 of SEQ ID NO:1 , nucleotides 11872-16104 of 
SEQ ID NO:1, nucleotides 12085-12114 of SEQ ID NO:1, nucleotides 12223-12246 of SEQ 
ID NO:1, nucleotides 12466-12507 of SEQ ID NO:1, nucleotides 12928-12960 of SEQ ID 
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NO:1, nucleotides 13516-13566 of SEQ ID NO:1, nucleotides 13633-13680 of SEQ ID NO:1, 
nucleotides 13876-13923 of SEQ ID NO:1, nucleotides 14313-14334 of SEQ ID NO:1, 
nucleotides 14473-14547 of SEQ ID NO:1 f nucleotides 14578-14607 of SEQ ID NO:1, 
nucleotides 14623-14692 of SEQ ID NO:1, nucleotides 15673-15693 of SEQ ID NO:1, 
nucleotides 15724-15762 of SEQ ID NO:1, nucleotides 14788-15639 of SEQ ID NO:1, 
nucleotides 15901-15924 of SEQ ID NO:1, nucleotides 16251-21749 of SEQ ID NO:1, 
nucleotides 16269-17546 of SEQ ID NO:1, nucleotides 17865-18827 of SEQ ID NO:1, 
nucleotides 18855-19361 of SEQ ID NO:1, nucleotides 20565-21302 of SEQ ID NO:1, 
nucleotides 21414-21626 of SEQ ID NO:1, nucleotides 21746-43519 of SEQ ID NO:1, 
nucleotides 21860-23116 of SEQ ID NO:1, nucleotides 23431-24397 of SEQ ID NO:1, 
nucleotides 25184-25942 of SEQ ID NO:1, nucleotides 26045-26263 of SEQ ID NO:1 f 
nucleotides 26318-27595 of SEQ ID NO:1 , nucleotides 2791 1-28876 of SEQ ID NO:1 , 
nucleotides 29678-30429 of SEQ ID NO:1, nucleotides 30539-30759 of SEQ ID NO:1, 
nucleotides 30815-32092 of SEQ ID NO:1, nucleotides 32408-33373 of SEQ ID NO:1, 
nucleotides 33401-33889 of SEQ ID NO:1 , nucleotides 35042-35902 of SEQ ID NO:1 , ; 
nucleotides 35930-36667 of SEQ ID NO:1, nucleotides 36773-36991 of SEQ ID NO:1, 
nucleotides 37052-38320 of SEQ ID NO:1, nucleotides 38636-39598 of SEQ ID NO:1, 
nucleotides 39635-40141 of SEQ ID NO:1, nucleotides 41369-42256 of SEQ ID NO:1, 
nucleotides 42314-43048 of SEQ ID NO:1, nucleotides 43163-43378 of SEQ ID NO:1, 
nucleotides 43524-54920 of SEQ ID NO:1 , nucleotides 43626-44885 of SEQ ID NO:1, 
nucleotides 45204-46166 of SEQ ID NO:1, nucleotides 46950-47702 of SEQ ID NO:1 f 
nucleotides 47811-48032 of SEQ ID NO:1, nucleotides 48087-49361 of SEQ ID NO:1, 
nucleotides 49680-50642 of SEQ ID NO:1, nucleotides 50670-51176 of SEQ ID NO:1, 
nucleotides 51534-52657 of SEQ ID NO:1, nucleotides 53697-54431 of SEQ ID NO:1, 
nucleotides 54540-54758 of SEQ ID NO:1 , nucleotides 54935-62254 of SEQ ID NO:1 , 
nucleotides 55028-56284 of SEQ ID NO:1, nucleotides 56600-57565 of SEQ ID NO:1, 
nucleotides 57593-58087 of SEQ ID NO:1, nucleotides 59366-60304 of SEQ ID NO:1, 
nucleotides 60362-61099 of SEQ ID NO:1, nucleotides 61211-61426 of SEQ ID NO:1, 
nucleotides 61427-62254 of SEQ ID NO:1, nucleotides 62369-63628 of SEQ ID NO:1, 
nucleotides 67334-68251 of SEQ ID NO:1, and nucleotides 1-68750 SEQ ID NO:1. 

In yet another preferred embodiment, the present invention provides an isolated 
nucleic acid molecule comprising a nucleotide sequence that encodes at least one 
polypeptide involved in the biosynthesis of an epothilone, wherein said nucleotide sequence 
comprises a consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 20) base pair nucleotide 
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portion identical in sequence to a respective consecutive 20, 25, 30, 35, 40, 45, or 50 
(preferably 20) base pair portion of a nucleotide sequence selected from the group 
consisting of: the complement of nucleotides 1900-3171 of SEQ ID NO:1, nucleotides 3415- 
5556 of SEQ ID NO:1 , nucleotides 7610-1 1875 of SEQ ID NO:1 , nucleotides 7643-8920 of 
SEQ ID NO:1, nucleotides 9236-10201 of SEQ ID NO:1, nucleotides 10529-11428 of SEQ 
ID NO:1, nucleotides 11549-11764 of SEQ ID NO:1, nucleotides 11872-16104 of SEQ ID 
NO:1, nucleotides 12085-12114 of SEQ ID NO:1 , nucleotides 12223-12246 of SEQ ID NO:1, 
nucleotides 12466-12507 of SEQ ID NO:1, nucleotides 12928-12960 of SEQ ID NO:1, 
nucleotides 13516-13566 of SEQ ID NO:1, nucleotides 13633-13680 of SEQ ID NO:1, 
nucleotides 13876-13923 of SEQ ID NO:1, nucleotides 14313-14334 of SEQ ID NO:1, 
nucleotides 14473-14547 of SEQ ID NO:1, nucleotides 14578-14607 of SEQ ID NO:1, 
nucleotides 14623-14692 of SEQ ID NO:1, nucleotides 15673-15693 of SEQ ID NO:1, 
nucleotides 15724-15762 of SEQ ID NO:1, nucleotides 14788-15639 of SEQ ID NO:1, 
nucleotides 15901-15924 of SEQ ID NO:1, nucleotides 16251-21749 of SEQ ID NO:1, . 
nucleotides 16269-17546 of SEQ ID NO:1, nucleotides 17865-18827 of SEQ ID NO:1, 
nucleotides 18855-19361 of SEQ ID NO:1, nucleotides 20565-21302 of SEQ ID NO:1, 
nucleotides 21414-21626 of SEQ ID NO:1, nucleotides 21746-43519 of SEQ ID NO:1, 
nucleotides 21860-23116 of SEQ ID NO: 1 , nucleotides 23431-24397 of SEQ ID NO:1, 
nucleotides 25184-25942 of SEQ ID NO:1, nucleotides 26045-26263 of SEQ ID NO:1, 
nucleotides 26318-27595 of SEQ ID NO:1, nucleotides 2791 1-28876 of SEQ ID NO:1, 
nucleotides 29678-30429 of SEQ ID NO:1, nucleotides 30539-30759 of SEQ ID NO:1, 
nucleotides 30815-32092 of SEQ ID NO:1 , nucleotides 32408-33373 of SEQ ID NO:1, 
nucleotides 33401-33889 of SEQ ID NO:1 , nucleotides 35042-35902 of SEQ ID NO:1 , 
nucleotides 35930-36667 of SEQ ID NO:1, nucleotides 36773-36991 of SEQ' ID NO:1, 
nucleotides 37052-38320 of SEQ ID NO:1, nucleotides 38636-39598 of SEQ ID NO:1, 
nucleotides 39635-40141 of SEQ ID NO:1, nucleotides 41369-42256 of SEQ ID NO:1, 
nucleotides 42314-43048 of SEQ ID NO:1 , nucleotides 43163-43378 of SEQ ID NO:1 , 
nucleotides 43524-54920 of SEQ ID NO:1 , nucleotides 43626-44885 of SEQ ID NO:1, 
nucleotides 45204-46166 of SEQ ID NO:1, nucleotides 46950-47702 of SEQ ID NO:1, 
nucleotides 4781 1-48032 of SEQ ID NO:1 , nucleotides 48087-49361 of SEQ ID NO:1, 
nucleotides 49680-50642 of SEQ ID NO:1, nucleotides 50670-51176 of SEQ ID NO:1, 
nucleotides 51534-52657 of SEQ ID NO:1, nucleotides 53697-54431 of SEQ ID NO:1, 
nucleotides 54540-54758 of SEQ ID NO:1, nucleotides 54935-62254 of SEQ ID NO:1, 
nucleotides 55028-56284 of SEQ ID NO:1, nucleotides 56600-57565 of SEQ ID NO:1, 
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nucleotides 57593-58087 of SEQ ID NO:1, nucleotides 59366-60304 of SEQ ID NO:1, 
nucleotides 60362-61099 of SEQ ID NO:1, nucleotides 61211-61426 of SEQ ID NO:1, 
nucleotides 61427-62254 of SEQ ID NO:1, nucleotides 62369-63628 of SEQ ID NO:1, 
nucleotides 67334-68251 of SEQ ID NO:1, and nucleotides 1-68750 SEQ ID NO:1. 

The present invention also provides a chimeric gene comprising a heterologous pro- 
moter sequence operatively linked to a nucleic acid molecule of the invention. Further, the 
present invention provides a recombinant vector comprising such a chimeric gene, wherein 
the vector is capable of being stably transformed into a host cell. Still further, the present 
invention provides a recombinant host cell comprising such a chimeric gene, wherein the 
host cell is capable of expressing the nucleotide sequence that encodes at least one poly- 
peptide necessary for the biosynthesis of an epothilone. In a preferred embodiment, the 
recombinant host cell is a bacterium belonging to the order Actinomycetales, and in a more 
preferred embodiment the recombinant host cell is a strain of Streptomyces. In other embo- 
diments, the recombinant host cell is any other bacterium amenable to fermentation, such as 
a pseudomonad or E colL Even further, the present invention provides a Bac clone ; 
comprising a nucleic acid molecule of the invention, preferably Bac clone pEPOIS. 

In another aspect, the present invention provides an isolated nucleic acid molecule 
comprising a nucleotide sequence that encodes an epothilone synthase domain. 

According to one embodiment, the epothilone synthase domain is a p-ketoacyl-syn- 
thase (KS) domain comprising an amino acid sequence substantially similar to an amino acid 
sequence selected from the group consisting of: amino acids 11-437 of SEQ ID NO:2, amino 
acids 7-432 of SEQ ID NO:4, amino acids 39-457 of SEQ ID NO:5, amino acids 1524-1950 
of SEQ ID NO:5, amino acids 3024-3449 of SEQ ID NO:5, amino acids 5103-5525 of SEQ 
ID NO:5, amino acids 35-454 of SEQ ID NO:6, amino acids 1522-1946 of SEQ ID NO: 6, 
and amino acids 32-450 of SEQ ID NO:7. According to this embodiment, said KS domain 
preferably comprises an amino acid sequence selected from the group consisting of: amino 
acids 11-437 of SEQ ID NO:2, amino acids 7-432 of SEQ ID NO:4, amino acids 39-457 of 
SEQ ID NO:5, amino acids 1524-1950 of SEQ ID NO:5, amino acids 3024-3449 of SEQ ID 
NO:5, amino acids 5103-5525 of SEQ ID NO:5, amino acids 35-454 of SEQ ID NO:6, amino 
acids 1522-1946 of SEQ ID NO: 6, and amino acids 32-450 of SEQ ID NO:7. Also, 
according to this embodiment, said nucleotide sequence preferably is substantially similar to 
a nucleotide sequence selected from the group consisting of: nucleotides 7643-8920 of SEQ 
ID NO:1, nucleotides 16269-17546 of SEQ ID NO:1, nucleotides 21860-23116 of SEQ ID 
NO:1, nucleotides 26318-27595 of SEQ ID NO:1, nucleotides 30815-32092 of SEQ ID NO:1, 
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nucleotides 37052-38320 of SEQ ID NO:1 , nucleotides 43626-44885 of SEQ ID NO:1 , 
nucleotides 48087-49361 of SEQ ID NO:1, and nucleotides 55028-56284 of SEQ ID NO:1. 
According to this embodiment, said nucleotide sequence more preferably comprises a 
consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 20) base pair nucleotide portion 
identical in sequence to a respective consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 
20) base pair portion of a nucleotide sequence selected from the group consisting of: 
nucleotides 7643-8920 of SEQ ID NO:1 , nucleotides 1 6269-17546 of SEQ ID NO:1 , 
nucleotides 21860-23116 of SEQ ID NO:1, nucleotides 26318-27595 of SEQ ID NO:1, 
nucleotides 30815-32092 of SEQ ID NO:1, nucleotides 37052-38320 of SEQ ID NO:1, 
nucleotides 43626-44885 of SEQ ID NO:1, nucleotides 48087-49361 of SEQ ID NO:1, and 
nucleotides 55028-56284 of SEQ ID NO:1. In addition, according to this embodiment, said 
nucleotide sequence most preferably is selected from the group consisting of: nucleotides 
7643-8920 of SEQ ID NO:1, nucleotides 16269-17546 of SEQ ID NO:1, nucleotides 21860- 
23116 of SEQ ID NO:1, nucleotides 26318-27595 of SEQ ID NO:1, nucleotides 30815- 
32092 of SEQ ID NO:1 , nucleotides 37052-38320 of SEQ ID NO:1 , nucleotides 43626- 
44885 of SEQ ID NO:1, nucleotides 48087-49361 of SEQ ID NO:1, and nucleotides 55028- 
56284 of SEQ ID NO:1. 

According to another embodiment, the epothilone synthase domain is an acyltrans- 
ferase (AT) domain comprising an amino acid sequence substantially similar to an amino 
acid sequence selected from the group consisting of: amino acids 543-864 of SEQ ID NO:2, 
amino acids 539-859 of SEQ ID NO:4, amino acids 563-884 of SEQ.jD NO:5, amino acids 
2056-2377 of SEQ ID NO:5, amino acids 3555-3876 of SEQ ID NO:5, amino acids 5631- 
5951 of SEQ ID NO:5, amino acids 561-881 of SEQ ID NO:6, amino acids 2053-2373 of 
SEQ ID NO:6, and amino acids 556-877 of SEQ ID NO:7. According to this embodiment, 
said AT domain preferably comprises an amino acid sequence selected from the group 
consisting of: amino acids 543-864 of SEQ ID NO:2, amino acids 539-859 of SEQ ID NO:4, 
amino acids 563-884 of SEQ ID NO:5, amino acids 2056-2377 of SEQ ID NO:5, amino acids 
3555-3876 of SEQ ID NO:5, amino acids 5631-5951 of SEQ ID NO:5, amino acids 561-881 
of SEQ ID NO:6, amino acids 2053-2373 of SEQ ID NO:6, and amino acids 556-877 of SEQ 
ID NO:7. Also, according to this embodiment, said nucleotide sequence preferably is 
substantially similar to a nucleotide sequence selected from the group consisting of: 
nucleotides 9236-10201 of SEQ ID NO:1, nucleotides 17865-18827 of SEQ ID NO:1, 
nucleotides 23431-24397 of SEQ ID NO:1, nucleotides 27911-28876 of SEQ ID NO:1, 
nucleotides 32408-33373 of SEQ ID NO:1, nucleotides 38636-39598 of SEQ ID NO:1, 
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nucleotides 45204-46166 of SEQ ID NO:1, nucleotides 49680-50642 of SEQ ID NO:1, and 
nucleotides 56600-57565 of SEQ ID NO: 1. According to this embodiment, said nucleotide 
sequence more preferably comprises a consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 
20) base pair nucleotide portion identical in sequence to a respective consecutive 20, 25, 
30, 35, 40, 45, or 50 (preferably 20) base pair portion of a nucleotide sequence selected 
from the group consisting of: nucleotides 9236-10201 of SEQ ID NO:1, nucleotides 17865- 
18827 of SEQ ID NO:1, nucleotides 23431-24397 of SEQ ID NO:1, nucleotides 27911- 
28876 of SEQ ID NO:1, nucleotides 32408-33373 of SEQ ID NO:1, nucleotides 38636- 
39598 of SEQ ID NO:1, nucleotides 45204-46166 of SEQ ID NO:1 , nucleotides 49680- 
50642 of SEQ ID NO:1, and nucleotides 56600-57565 of SEQ ID NO:1. In addition, accor- 
ding to this embodiment, said nucleotide sequence most preferably is selected from the 
group consisting of: nucleotides 9236-10201 of SEQ ID NO:1, nucleotides 17865-18827 of 
SEQ ID NO:1, nucleotides 23431-24397 of SEQ ID NO:1 , nucleotides 27911-28876 of SEQ 
ID NO:1 , nucleotides 32408-33373 of SEQ ID NO:1 , nucleotides 38636-39598 of SEQ ID . 
NO:1, nucleotides 45204-46166 of SEQ ID NO:1, nucleotides 49680-50642 of SEQ ID NO:1, 
and nucleotides 56600-57565 of SEQ ID NO:1 . 

According to still another embodiment, the epothilone synthase domain is an enoyl 
reductase (ER) domain comprising an amino acid sequence substantially similar to an amino 
acid sequence selected from the group consisting of: amino acids 974-1273 of SEQ ID 
NO:2, amino acids 4433-4719 of SEQ ID NO:5, amino acids 6542-6837 of SEQ ID NO:5, 
and amino acids 1478-1790 of SEQ ID NO:7. According to this embodiment, said ER do- 
main preferably comprises an amino acid sequence selected from the group consisting of: 
amino acids 974-1273 of SEQ ID NO:2, amino acids 4433-4719 of SEQ ID NO:5, amino 
acids 6542-6837 of SEQ ID NO:5, and amino acids 1478-1790 of SEQ ID N6:7. Also, ac- 
cording to this embodiment, said nucleotide sequence preferably is substantially similar to a 
nucleotide sequence selected from the group consisting of: nucleotides 10529-11428 of 
SEQ ID NO:1, nucleotides 35042-35902 of SEQ ID NO:1 , nucleotides 41369-42256 of SEQ 
ID NO:1, and nucleotides 59366-60304 of SEQ ID NO:1. According to this embodiment, said 
nucleotide sequence more preferably comprises a consecutive 20, 25, 30, 35, 40, 45, or 50 
(preferably 20) base pair nucleotide portion identical in sequence to a respective consecutive 
20, 25, 30, 35, 40, 45, or 50 (preferably 20) base pair portion of a nucleotide sequence 
selected from the group consisting of: nucleotides 10529-1 1428 of SEQ ID NO:1 , 
nucleotides 35042-35902 of SEQ ID NO:1, nucleotides 41369-42256 of SEQ ID NO:1, and 
nucleotides 59366-60304 of SEQ ID NO:1. In addition, according to this embodiment, said 
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nucleotide sequence most preferably is selected from the group consisting of: nucleotides 
10529-11428 of SEQ ID NO:1, nucleotides 35042-35902 of SEQ ID NO:1, nucleotides 
41369-42256 of SEQ ID NO:1 , and nucleotides 59366-60304 of SEQ ID NO:1 . 

According to another embodiment, the epothilone synthase domain is an acyl carrier 
protein (ACP) domain, wherein said polypeptide comprises an amino acid sequence 
substantially similar to an amino acid sequence selected from the group consisting of: amino 
acids 1314-1385 of SEQ ID NO:2, amino acids 1722-1792 of SEQ ID NO:4, amino acids 
1434-1506 of SEQ ID NO:5, amino acids 2932-3005 of SEQ ID NO:5, amino acids 5010- 
5082 of SEQ ID NO:5,^ amino acids 7140-721 1 of SEQ ID NO:5, amino acids 1430-1503 of 
SEQ ID NO:6, amino acids 3673-3745 of SEQ ID NO:6, and amino acids 2093-2164 of SEQ 
ID NO:7. According to this embodiment, said ACP domain preferably comprises an amino 
i acid sequence selected from the group consisting of: amino acids 1314-1385 of SEQ ID 
3 NO:2, amino acids 1722-1792 of SEQ ID NO:4, amino acids 1434-1506 of SEQ ID NO:5, 
i amino acids 2932-3005 of SEQ ID NO:5, amino acids 5010-5082 of SEQ ID NO:5, amino 
^ acids 7140-7211 of SEQ ID NO:5, amino acids 1430-1503 of SEQ ID NO:6, amino acids ; 
3673-3745 of SEQ ID NO:6, and amino acids 2093-2164 of SEQ ID NO:7. Also, according to 
this embodiment, said nucleotide sequence preferably is substantially similar to a nucleotide 
.* sequence selected from the group consisting of: nucleotides 1 1549-1 1764 of SEQ ID NO:1 , 

nucleotides 21 41 4-21 626 of SEQ ID NO:1 , nucleotides 26045-26263 of SEQ ID NO:1 , 
;3 nucleotides 30539-30759 of SEQ ID NO:1 , nucleotides 36773-36991 of SEQ ID NO:1 , 
nucleotides 431 63-43378 of SEQ ID NO:1 , nucleotides 4781 1-48032 of SEQ ID NO:1 , 
nucleotides 54540-54758 of SEQ ID NO:1, and nucleotides 61211-61426 of SEQ ID NO:1. 
According to this embodiment, said nucleotide sequence more preferably comprises a 
consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 20) base pair nucleotide portion 
identical in sequence to a respective consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 
20) base pair portion of a nucleotide sequence selected from the group consisting of: 
nucleotides 11549-11764 of SEQ ID NO:1, nucleotides 21414-21626 of SEQ ID NO:1, 
nucleotides 26045-26263 of SEQ ID NO:1, nucleotides 30539-30759 of SEQ ID NO:1, 
nucleotides 36773-36991 of SEQ ID NO:1, nucleotides 43163-43378 of SEQ ID NO:1, 
nucleotides 47811-48032 of SEQ ID NO:1, nucleotides 54540-54758 of SEQ ID NO:1, and 
nucleotides 61211-61426 of SEQ ID NO:1. In addition, according to this embodiment, said 
nucleotide sequence most preferably is selected from the group consisting of: nucleotides 
11549-11764 of SEQ ID NO:1, nucleotides 21 41 4-21 626 of SEQ ID NO:1, nucleotides 
26045-26263 of SEQ ID NO:1 , nucleotides 30539-30759 of SEQ ID NO:1, nucleotides 
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36773-36991 of SEQ ID N0:1, nucleotides 43163-43378 of SEQ ID NO:1, nucleotides 
4781 1-48032 of SEQ ID NO:1 , nucleotides 54540-54758 of SEQ ID NO:1 , and nucleotides 
6121 1-61426 of SEQ ID NO:1. 

According to another embodiment, the epothilone synthase domain is a dehydratase 
(DH) domain comprising an amino acid sequence substantially similar to an amino acid se- 
quence selected from the group consisting of: amino acids 869-1037 of SEQ ID NO:4, amino 
acids 3886-4048 of SEQ ID NO:5, amino acids 5964-6132 of SEQ ID NO:5, amino acids 
2383-2551 of SEQ ID NO:6, and amino acids 887-1051 of SEQ ID NO:7. According to this 
embodiment, said DH domain preferably comprises an amino acid sequence selected from 
the group consisting of: amino acids 869-1037 of SEQ ID NO:4, amino acids 3886-4048 of 
SEQ ID NO:5, amino acids 5964-6132 of SEQ ID NO:5, amino acids 2383-2551 of SEQ ID 
NO:6, and amino acids 887-1051 of SEQ ID NO:7. Also, according to this embodiment, said 
nucleotide sequence preferably is substantially similar to a nucleotide sequence selected 
from the group consisting of: nucleotides 18855-19361 of SEQ ID NO:1, nucleotides 33401- 
33889 of SEQ ID NO:1 , nucleotides 39635-40141 of SEQ ID NO:1 , nucleotides 50670- ; 
51 176 of SEQ ID NO:1 , and nucleotides 57593-58087 of SEQ ID NO:1 . According to this 
embodiment, said nucleotide sequence more preferably comprises a consecutive 20, 25, 30, 
35, 40, 45, or 50 (preferably 20) base pair nucleotide portion identical in sequence to a 
respective consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 20) base pair portion of a 
nucleotide sequence selected from the group consisting of: nucleotides 18855-19361 of 
SEQ ID NO:1, nucleotides 33401-33889 of SEQ ID NO:1, nucleotides 39635-40141 of SEQ 
ID NO:1, nucleotides 50670-51176 of SEQ ID NO:1 , and nucleotides 57593-58087 of SEQ 
ID NO:1. In addition, according to this embodiment, said nucleotide sequence most pre- 
ferably is selected from the group consisting of: nucleotides 18855-19361 of SEQ ID NO:1, 
nucleotides 33401-33889 of SEQ ID NO:1, nucleotides 39635-40141 of SEQ ID NO:1, 
nucleotides 50670-51176 of SEQ ID NO:1, and nucleotides 57593-58087 of SEQ ID NO:1. 

According to yet another embodiment, the epothilone synthase domain is a p-keto- 
reductase (KR) domain comprising an amino acid sequence substantially similar to an amino 
acid sequence selected from the group consisting of: amino acids 1439-1684 of SEQ ID 
NO:4, amino acids 1 147-1399 of SEQ ID NO:5, amino acids 2645-2895 of SEQ ID NO:5, 
amino acids 4729-4974 of SEQ ID NO:5, amino acids 6857-7101 of SEQ ID NO:5, amino 
acids 1 143-1393 of SEQ ID NO:6, amino acids 3392-3636 of SEQ ID NO:6, and amino acids 
1810-2055 of SEQ ID NO:7. According to this embodiment, said KR domain preferably 
comprises an amino acid sequence selected from the group consisting of: amino acids 1439- 
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1684 of SEQ ID NO:4, amino acids 1 147-1399 of SEQ ID NO:5, amino acids 2645-2895 of 
SEQ ID NO:5, amino acids 4729-4974 of SEQ ID NO:5, amino acids 6857-7101 of SEQ ID 
NO:5, amino acids 1143-1393 of SEQ ID NO:6, amino acids 3392-3636 of SEQ ID NO:6, 
and amino acids 1810-2055 of SEQ ID NO:7. Also, according to this embodiment, said 
nucleotide sequence preferably is substantially similar to a nucleotide sequence selected 
from the group consisting of: nucleotides 20565-21302 of SEQ ID NO:1, nucleotides 25184- 
25942 of SEQ ID NO:1, nucleotides 29678-30429 of SEQ ID NO:1, nucleotides 35930- 
36667 of SEQ ID NO:1, nucleotides 42314-43048 of SEQ ID NO:1, nucleotides 46950- 
47702 of SEQ ID NO:1, nucleotides 53697-54431 of SEQ ID NO:1, and nucleotides 60362- 
61099 of SEQ ID NO:1. According to this embodiment, said nucleotide sequence more 
preferably comprises a consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 20) base pair 
nucleotide portion identical in sequence to a respective consecutive 20, 25, 30, 35, 40, 45, 
or 50 (preferably 20) base pair portion of a nucleotide sequence selected from the group 
consisting of: nucleotides 20565-21302 of SEQ ID NO:1, nucleotides 25184-25942 of SEQ 
ID NO:1 , nucleotides 29678-30429 of SEQ ID NO:1 , nucleotides 35930-36667 of SEQ ID ; 
NO:1, nucleotides 42314-43048 of SEQ ID NO:1, nucleotides 46950-47702 of SEQ ID NO:1, 
nucleotides 53697-54431 of SEQ ID NO:1, and nucleotides 60362-61099 of SEQ ID NO:1. 
In addition, according to this embodiment, said nucleotide sequence most preferably is 
selected from the group consisting of: nucleotides 20565-21302 of SEQ ID NO:1, nucle- 
otides 25184-25942 of SEQ ID NO:1, nucleotides 29678-30429 of SEQ ID NO:1, nucleotides 
35930-36667 of SEQ ID NO:1, nucleotides 42314-43048 of SEQ ID NO:1, nucleotides 
46950-47702 of SEQ ID NO:1, nucleotides 53697-54431 of SEQ ID NO:1, and nucleotides 
60362-61099 of SEQ ID NO:1 . 

According to an additional embodiment, the epothilone synthase domain is a 
methyltransferase (MT) domain comprising an amino acid sequence substantially similar to 
amino acids 2671-3045 of SEQ ID NO:6. According to this embodiment, said MT domain 
preferably comprises amino acids 2671-3045 of SEQ ID NO:6. Also, according to this 
embodiment, said nucleotide sequence preferably is substantially similar to nucleotides 
51534-52657 of SEQ ID NO:1 . According to this embodiment, said nucleotide sequence 
more preferably comprises a consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 20) base 
pair nucleotide portion identical in sequence to a respective consecutive 20, 25, 30, 35, 40, 
45, or 50 (preferably 20) base pair portion of nucleotides 51534-52657 of SEQ ID NO:1 . In 
addition, according to this embodiment, said nucleotide sequence most preferably is nucleo- 
tides 51534-52657 of SEQ ID NO:1. 
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According to another embodiment, the epothilone synthase domain is a thioesterase 
(TE) domain comprising an amino acid sequence substantially similar to amino acids 2165- 
2439 of SEQ ID NO:7. According to this embodiment, said TE domain preferably comprises 
amino acids 2165-2439 of SEQ ID NO:7. Also, according to this embodiment, said nucleo- 
tide sequence preferably is substantially similar to nucleotides 61427-62254 of SEQ ID 
NO:1. According to this embodiment, said nucleotide sequence more preferably comprises a 
consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 20) base pair nucleotide portion iden- 
tical in sequence to a respective consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 20) 
base pair portion of nucleotides 61427-62254 of SEQ ID NO:1. in addition, according to this 
embodiment, said nucleotide sequence most preferably is nucleotides 61427-62254 of SEQ 
ID NO:1. 

In still another aspect, the present invention provides an isolated nucleic acid mole- 
cule comprising a nucleotide sequence that encodes a non-ribosomal peptide synthetase, 
wherein said non-ribosomal peptide synthetase comprises an amino acid sequence 
substantially similar to an amino acid sequence selected from the group consisting of: SEQ ; 
ID NO:3, amino acids 72-81 of SEQ ID NO:3, amino acids 118-125 of SEQ ID NO:3, amino 
acids 199-212 of SEQ ID NO:3, amino acids 353-363 of SEQ ID NO:3, amino acids 549-565 
of SEQ ID NO:3, amino acids 588-603 of SEQ ID NO:3, amino acids 669-684 of SEQ ID 
NO:3, amino acids 815-821 of SEQ ID NO:3, amino acids 868-892 of SEQ ID NO:3, amino 
acids 903-912 of SEQ ID NO:3, amino acids 918-940 of SEQ ID NO:3, amino acids 1268- 
1274 of SEQ ID NO:3, amino acids 1285-1297 of SEQ ID NO:3, amino acids 973-1256 of 
SEQ ID NO:3, and amino acids 1344-1351 of SEQ ID NO:3. According to this embodiment, 
said non-ribosomal peptide synthetase preferably comprises an amino acid sequence 
selected from the group consisting of: SEQ ID NO:3, amino acids 72-81 of SEQ ID NO:3, 
amino acids 118-125 of SEQ ID NO:3, amino acids 199-212 of SEQ ID NO:3, amino acids 
353-363 of SEQ ID NO:3, amino acids 549-565 of SEQ ID NO:3, amino acids 588-603 of 
SEQ ID NO:3, amino acids 669-684 of SEQ ID NO:3, amino acids 815-821 of SEQ ID NO:3, 
amino acids 868-892 of SEQ ID NO:3, amino acids 903-912 of SEQ ID NO:3, amino acids 
918-940 of SEQ ID NO:3, amino acids 1268-1274 of SEQ ID NO:3, amino acids 1285-1297 
of SEQ ID NO:3, amino acids 973-1256 of SEQ ID NO:3, and amino acids 1344-1351 of 
SEQ ID NO:3. Also, according to this embodiment, said nucleotide sequence preferably is 
substantially similar to a nucleotide sequence selected from the group consisting of: 
nucleotides 11872-16104 of SEQ ID NO:1, nucleotides 12085-12114 of SEQ ID NO:1, 
nucleotides 12223-12246 of SEQ ID NO:1, nucleotides 12466-12507 of SEQ ID NO:1, 
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nucleotides 12928-12960 of SEQ ID NO:1, nucleotides 13516-13566 of SEQ ID NO:1, 
nucleotides 13633-13680 of SEQ ID NO:1, nucleotides 13876-13923 of SEQ ID NO:1, 
nucleotides 14313-14334 of SEQ ID NO:1, nucleotides 14473-14547 of SEQ ID NO:1, 
nucleotides 14578-14607 of SEQ ID NO:1, nucleotides 14623-14692 of SEQ ID NO:1, 
nucleotides 15673-15693 of SEQ ID NO:1, nucleotides 15724-15762 of SEQ ID NO:1, 
nucleotides 14788-15639 of SEQ ID NO:1, and nucleotides 15901-15924 of SEQ ID NO:1. 
According to this embodiment, said nucleotide sequence more preferably comprises a 
consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 20) base pair nucleotide portion 
identical in sequence to a respective consecutive 20, 25, 30, 35, 40, 45, or 50 (preferably 
20) base pair portion of a nucleotide sequence selected from the group consisting of: nucle- 
otides 1 1872-16104 of SEQ ID NO:1 , nucleotides 12085-121 14 of SEQ ID NO:1 , nucleotides 
12223-12246 of SEQ ID NO:1, nucleotides 12466-12507 of SEQ ID NO:1, nucleotides 
12928-12960 of SEQ ID NO:1, nucleotides 13516-13566 of SEQ ID NO:1, nucleotides 
13633-13680 of SEQ ID NO:1 , nucleotides 13876-13923 of SEQ ID NO:1, nucleotides 
1 431 3-1 4334 of SEQ ID NO:1 , nucleotides 1 4473-1 4547 of SEQ ID NO:1 , nucleotides y 
14578-14607 of SEQ ID NO:1, nucleotides 14623-14692 of SEQ ID NO:1, nucleotides 
15673-15693 of SEQ ID NO:1, nucleotides 15724-15762 of SEQ ID NO:1, nucleotides 
14788-15639 of SEQ ID NO:1, and nucleotides 15901-15924 of SEQ ID NO:1. In addition, 
according to this embodiment, said nucleotide sequence most preferably is selected from the 
group consisting of: nucleotides 11872-16104 of SEQ ID NO:1, nucleotides 12085-12114 of 
SEQ ID NO:1, nucleotides 12223-12246 of SEQ ID NO:1 , nucleotides 12466-12507 of SEQ 
ID NO:1, nucleotides 12928-12960 of SEQ ID NO:1 , nucleotides 13516-13566 of SEQ ID 
NO:1, nucleotides 13633-13680 of SEQ ID NO:1, nucleotides 13876-13923 of SEQ ID NO:1, 
nucleotides 14313-14334 of SEQ ID NO: 1 , nucleotides 14473-14547 of SEQ ID NO:1, 
nucleotides 14578-14607 of SEQ ID NO: 1 , nucleotides 14623-14692 of SEQ ID NO:1, 
nucleotides 15673-15693 of SEQ ID NO:1, nucleotides 15724-15762 of SEQ ID NO:1, 
nucleotides 14788-15639 of SEQ ID NO:1, and nucleotides 15901-15924 of SEQ ID NO:1. 

The present invention further provides an isolated nucleic acid molecule comprising a 
nucleotide sequence that encodes a polypeptide comprising an amino acid sequence 
selected from the group consisting of SEQ ID NOs:2-23. 

In accordance with another aspect, the present invention also provides methods for 
the recombinant production of polyketides such as epothilones in quantities large enough to 
enable their purification and use in pharmaceutical formulations such as those for the treat- 
ment of cancer. A specific advantage of these production methods is the chirality of the 
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molecules produced; production in transgenic organisms avoids the generation of popu- 
lations of racemic mixtures, within which some enantiomers may have reduced activity. In 
particular, the present invention provides a method for heterologous expression of epothilone 
in a recombinant host, comprising: (a) introducing into a host a chimeric gene comprising a 
heterologous promoter sequence operatively linked to a nucleic acid molecule of the 
invention that comprises a nucleotide sequence that encodes at least one polypeptide in- 
volved in the biosynthesis of epothilone; and (b) growing the host in conditions that allow 
biosynthesis of epothilone in the host. The present invention also provides a method for 
producing epothilone, comprising: (a) expressing epothilone in a recombinant host by the 
aforementioned method; and (b) extracting epothilone from the recombinant host. 

According to still another aspect, the present invention provides an isolated polypep- 
tide comprising an amino acid sequence that consists of an epothilone synthase domain. 

According to one embodiment, the epothilone synthase domain is a p-ketoacyl- 
synthase (KS) domain comprising an amino acid sequence substantially similar to an amino 
acid sequence selected from the group consisting of: amino acids 1 1-437 of SEQ ID NO:2, \ 
amino acids 7-432 of SEQ ID NO:4, amino acids 39-457 of SEQ ID NO:5, amino acids 1524- 
1950 of SEQ ID NO:5, amino acids 3024-3449 of SEQ ID NO:5, amino acids 5103-5525 of 
SEQ ID NO:5, amino acids 35-454 of SEQ ID NO:6, amino acids 1522-1946 of SEQ ID NO: 
6, and amino acids 32-450 of SEQ ID NO:7. According to this embodiment, said KS domain 
pre f era biy comprises an amino acid sequence selected from the group consisting of: amino 
acids 1 1-437 of SEQ ID NO:2, amino acids 7-432 of SEQ ID NO:4, amino acids 39-457 of 
SEQ ID NO:5, amino acids 1524-1950 of SEQ ID NO:5, amino acids 3024-3449 of SEQ ID 
NO:5, amino acids 5103-5525 of SEQ ID NO:5, amino acids 35-454 of SEQ ID NO:6, amino 
acids 1522-1946 of SEQ ID NO: 6, and amino acids 32-450 of SEQ ID NO:7 ; 

According to another embodiment, the epothilone synthase domain is an acyltrans- 
ferase (AT) domain comprising an amino acid sequence substantially similar to an amino 
acid sequence selected from the group consisting of: amino acids 543-864 of SEQ ID NO:2, 
amino acids 539-859 of SEQ ID NO:4, amino acids 563-884 of SEQ ID NO:5, amino acids 
2056-2377 of SEQ ID NO:5, amino acids 3555-3876 of SEQ ID NO:5, amino acids 5631- 
5951 of SEQ ID NO:5, amino acids 561-881 of SEQ ID NO:6, amino acids 2053-2373 of 
SEQ ID NO:6, and amino acids 556-877 of SEQ ID NO:7. According to this embodiment, 
said AT domain preferably comprises an amino acid sequence selected from the group 
consisting of: amino acids 543-864 of SEQ ID NO:2, amino acids 539-859 of SEQ ID NO:4, 
amino acids 563-884 of SEQ ID NO:5, amino acids 2056-2377 of SEQ ID NO:5, amino acids 
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3555-3876 of SEQ ID NO:5, amino acids 5631-5951 of SEQ ID NO:5, amino acids 561-881 
of SEQ ID NO:6, amino acids 2053-2373 of SEQ ID NO:6, and amino acids 556-877 of SEQ 
ID NO:7. 

According to still another embodiment, the epothilone synthase domain is an enoyl 
reductase (ER) domain comprising an amino acid sequence substantially similar to an amino 
acid sequence selected from the group consisting of: amino acids 974-1273 of SEQ ID 
NO:2, amino acids 4433-4719 of SEQ ID NO:5, amino acids 6542-6837 of SEQ ID NO:5, 
and amino acids 1478-1790 of SEQ ID NO:7. According to this embodiment, said ER do- 
main preferably comprises an amino acid sequence selected from the group consisting of: 
amino acids 974-1273 of SEQ ID NO:2, amino acids 4433-4719 of SEQ ID NO:5, amino 
acids 6542-6837 of SEQ ID NO:5, and amino acids 1478-1790 of SEQ ID NO:7. 

According to another embodiment, the epothilone synthase domain is an acyl carrier 
protein (ACP) domain, wherein said polypeptide comprises an amino acid sequence 
substantially similar to an amino acid sequence selected from the group consisting of: amino 
acids 1314-1385 of SEQ ID NO:2 f amino acids 1722-1792 of SEQ ID NO:4, amino acids \ 
1434-1506 of SEQ ID NO:5, amino acids 2932-3005 of SEQ ID NO:5, amino acids 5010- 
5082 of SEQ ID NO:5, amino acids 7140-7211 of SEQ ID NO:5, amino acids 1430-1503 of 
SEQ ID NO:6, amino acids 3673-3745 of SEQ ID NO:6, and amino acids 2093-2164 of SEQ 
ID NO:7. According to this embodiment, said ACP domain preferably comprises an amino 
acid sequence selected from the group consisting of: amino acids 1314-1385 of SEQ ID 
NO:2, amino acids 1722-1792 of SEQ ID NO:4, amino acids 1434-1506 of SEQ ID NO:5, 
amino acids 2932-3005 of SEQ ID NO:5, amino acids 5010-5082 of SEQ ID NO:5, amino 
acids 7140-7211 of SEQ ID NO:5, amino acids 1430-1503 of SEQ ID NO:6, amino acids 
3673-3745 of SEQ ID NO:6, and amino acids 2093-2164 of SEQ ID NO:7. 

According to another embodiment, the epothilone synthase domain is a dehydratase 
(DH) domain comprising an amino acid sequence substantially similar to an amino acid se- 
quence selected from the group consisting of: amino acids 869-1037 of SEQ ID NO:4, amino 
acids 3886-4048 of SEQ ID NO:5, amino acids 5964-6132 of SEQ ID NO:5, amino acids 
2383-2551 of SEQ ID NO:6, and amino acids 887-1051 of SEQ ID NO:7. According to this 
embodiment, said DH domain preferably comprises an amino acid sequence selected from 
the group consisting of: amino acids 869-1037 of SEQ ID NO:4, amino acids 3886-4048 of 
SEQ ID NO:5, amino acids 5964-6132 of SEQ ID NO:5, amino acids 2383-2551 of SEQ ID 
NO:6, and amino acids 887-1051 of SEQ ID NO:7. 
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According to yet another embodiment, the epothilone synthase domain is a p-keto- 
reductase (KR) domain comprising an amino acid sequence substantially similar to an amino 
acid sequence selected from the group consisting of: amino acids 1439-1684 of SEQ ID 
NO:4, amino acids 1147-1399 of SEQ ID NO:5, amino acids 2645-2895 of SEQ ID NO:5, 
amino acids 4729-4974 of SEQ ID NO:5, amino acids 6857-7101 of SEQ ID NO:5, amino 
acids 1 143-1393 of SEQ ID NO:6, amino acids 3392-3636 of SEQ ID NO:6, and amino acids 
1810-2055 of SEQ ID NO:7. According to this embodiment, said KR domain preferably 
comprises an amino acid sequence selected from the group consisting of: amino acids 1439- 
1684 of SEQ ID NO:4, amino acids 1147-1399 of SEQ ID NO:5, amino acids 2645-2895 of 
SEQ ID NO:5, amino acids 4729-4974 of SEQ ID NO:5, amino acids 6857-7101 of SEQ ID 
NO:5, amino acids 1143-1393 of SEQ ID NO:6, amino acids 3392-3636 of SEQ ID NO:6, 
and amino acids 1810-2055 of SEQ ID NO:7. 

According to an additional embodiment, the epothilone synthase domain is a methyl- 
transferase (MT) domain comprising an amino acid sequence substantially similar to amino 
acids 2671-3045 of SEQ ID NO:6. According to this embodiment, said MT domain preferably 
comprises amino acids 2671-3045 of SEQ ID NO:6. 

According to another embodiment, the epothilone synthase domain is a thioesterase 
(TE) domain comprising an amino acid sequence substantially similar to amino acids 2165- 
2439 of SEQ ID NO:7. According to this embodiment, said TE domain preferably comprises 
amino acids 2165-2439 of SEQ ID NO:7. 

Other aspects and advantages of the present invention will become apparent to those 
skilled in the art from a study of the following description of the invention and non-limiting 
examples. 

DEFINITIONS 

In describing the present invention, the following terms will be employed, and are 
intended to be defined as indicated below. 

Associated With / Operatively Linked: Refers to two DNA sequences that are related 
physically or functionally. For example, a promoter or regulatory DNA sequence is said to be 
"associated with" a DNA sequence that codes for an RNA or a protein if the two sequences 
are operatively linked, or situated such that the regulator DNA sequence will affect the 
expression level of the coding or structural DNA sequence. 
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Chimeric Gene: A recombinant DNA sequence in which a promoter or regulatory 
DNA sequence is operatively linked to, or associated with, a DNA sequence that codes for 
an mRNA or which is expressed as a protein, such that the regulator DNA sequence is able 
to regulate transcription or expression of the associated DNA sequence. The regulator DNA 
sequence of the chimeric gene is not normally operatively linked to the associated DNA 
sequence as found in nature. 

Coding DNA Sequence: A DNA sequence that is translated in an organism to pro- 
duce a protein. 

Domain: That part of a polyketide synthase necessary for a given distinct activity. 
Examples include acyl carrier protein (ACP), p-ketosynthase (KS), acyltransferase (AT), p- 
ketoreductase (KR), dehydratase (DH), enoylreductase (ER), and thioesterase (TE) 
domains. 

Epothilones: 16-membered macrocyclic polyketides naturally produced by the bacte- 
rium Sorangium cellulosum strain So ce90, which mimic the biological effects of taxol. In 
this application, "epothilone" refers to the class of polyketides that includes epothilone A and 
epothilone B, as well as analogs thereof such as those described in WO 98/25929. 

Epothilone Synthase: A polyketide synthase responsible for the biosynthesis of epo- 
thilone. 

Gene: A defined region that is located within a genome and that, besides the afore- 
mentioned coding DNA sequence, comprises other, primarily regulatory, DNA sequences 
responsible for the control of the expression, that is to say the transcription and translation, 
of the coding portion. 

Heterologous DNA Sequence: A DNA sequence not naturally associated with a host 
cell into which it is introduced, including non-naturally occurring multiple copies of a naturally 
occurring DNA sequence. 

Homologous DNA Sequence: A DNA sequence naturally associated with a host cell 
into which it is introduced. 

Homologous Recombination: Reciprocal exchange of DNA fragments between 
homologous DNA molecules. 

Isolated: In the context of the present invention, an isolated nucleic acid molecule or 
an isolated enzyme is a nucleic acid molecule or enzyme that, by the hand of man, exists 
apart from its native environment and is therefore not a product of nature. An isolated 
nucleic acid molecule or enzyme may exist in a purified form or may exist in a non-native 
environment such as, for example, a recombinant host ceil. 
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Module: A genetic element encoding all of the distinct activities required in a single 
round of polyketide biosynthesis, i.e., one condensation step and all the p-carbonyl pro- 
cessing steps associated therewith. Each module encodes an ACP, a KS, and an AT 
activity to accomplish the condensation portion of the biosynthesis, and selected post- 
condensation activities to effect the (3-carbonyI processing. 

NRPS: A non-ribosomal polypeptide synthetase, which is a complex of enzymatic 
activities responsible for the incorporation of amino acids into secondary metabolites in- 
cluding, for example, amino acid adenylation, epimerization, N-methylation, cyclization, 
peptidyl carrier protein, and condensation domains. A functional NRPS is one that catalyzes 
the incorporation of an amino acid into a secondary metabolite. 

NRPS gene: One or more genes encoding NRPSs for producing functional secon- 
dary metabolites, e.g., epothilones A and B, when under the direction of one or more com- 
patible control elements. 

Nucleic Acid Molecule: A linear segment of single- or double-stranded DNA or RNA 
that can be isolated from any source. In the context of the present invention, the nucleic * 
acid molecule is preferably a segment of DNA. 

ORF: Open Reading Frame. 

PKS: A polyketide synthase, which is a complex of enzymatic activities (domains) 
responsible for the biosynthesis of polyketides including, for example, ketoreductase, dehy- 
dratase, acyl carrier protein, enoylreductase, ketoacyl ACP synthase, and acyltransf erase. A 
functional PKS is one that catalyzes the synthesis of a polyketide. 

PKS Genes: One or more genes encoding various polypeptides required for produ- 
cing functional polyketides, e.g., epothilones A and B, when under the direction of one or 
more compatible control elements. 

Substantially Similar: With respect to nucleic acids, a nucleic acid molecule that has 
at least 60 percent sequence identity with a reference nucleic acid molecule. In a preferred 
embodiment, a substantially similar DNA sequence is at least 80% identical to a reference 
DNA sequence; in a more preferred embodiment, a substantially similar DNA sequence is at 
least 90% identical to a reference DNA sequence; and in a most preferred embodiment, a 
substantially similar DNA sequence is at least 95% identical to a reference DNA sequence. 
A substantially similar DNA sequence preferably encodes a protein or peptide having 
substantially the same activity as the protein or peptide encoded by the reference DNA 
sequence. A substantially simitar nucleotide sequence typically hybridizes to a reference 
nucleic acid molecule, or fragments thereof, under the following conditions: hybridization at 
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7% sodium dodecyl sulfate (SDS), 0.5 M NaP0 4 pH 7.0, 1 mM EDTA at 50°C; wash with 2X 
SSC, 1% SDS, at 50°C. With respect to proteins or peptides, a substantially similar amino 
acid sequence is an amino acid sequence that is at least 90% identical to the amino acid 
sequence of a reference protein or peptide and has substantially the same activity as the 
reference protein or peptide. 

Transformation: A process for introducing heterologous nucleic acid into a host cell or 
organism. 

Transformed / Transgenic / Recombinant: Refers to a host organism such as a bac- 
terium into which a heterologous nucleic acid molecule has been introduced. The nucleic 
acid molecule can be stably integrated into the genome of the host or the nucleic acid mo- 
lecule can also be present as an extrachromosomal molecule. Such an extrachromosomal 
molecule can be auto-replicating. Transformed cells, tissues, or plants are understood to 
encompass not only the end product of a transformation process, but also transgenic pro- 
geny thereof. A M non-transformed M , "non-transgenic", or "non-recombinant" host refers to a 
wild-type organism, i.e., a bacterium, which does not contain the heterologous nucleic acid i. 
molecule. 

Nucleotides are indicated by their bases by the following standard abbreviations: 
adenine (A), cytosine (C), thymine (T), and guanine (G). Amino acids are likewise indicated 
by the following standard abbreviations: alanine (ala; A), arginine (Arg; R), asparagine (Asn; 
N), aspartic acid (Asp; D), cysteine (Cys; C), glutamine (Gin; Q), glutamic acid (Glu; E), 
glycine (Gly; G), histidine (His; H), isoleucine (lie; I), leucine (Leu; L), lysine (lys; K), 
methionine (Met; M), phenylalanine (Phe; F), proline (Pro; P), serine (Ser; S), threonine (Thr; 
T), tryptophan (Trp; W), tyrosine (Tyr; Y), and valine (Val; V). Furthermore, (Xaa; X) * 
represents any amino acid. 

DESCRIPTION OF THE SEQUENCES IN THE SEQUENCE LISTING 

SEQ ID NO:1 is the nucleotide sequence of a 68750 bp contig containing 22 open 
reading frames (ORFs), which comprises the epothilone biosynthesis genes. 

SEQ ID NO:2 is the protein sequence of a type I polyketide synthase (EPOS A) 
encoded by epoA (nucleotides 7610-11875 of SEQ ID NO:1). 

SEQ ID NO:3 is the protein sequence of a non-ribosomal peptide synthetase (EPOS 
P) encoded by epoP (nucleotides 11872-16104 of SEQ ID NO:1). 
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SEQ ID NO:4 is the protein sequence of a type I polyketide synthase (EPOS B) 
encoded by epoB (nucleotides 16251-21749 of SEQ ID NO:1). 

SEQ ID NO:5 is the protein sequence of a type I polyketide synthase (EPOS C) 
encoded by epoC (nucleotides 21746-43519 of SEQ ID NO:1). 

SEQ ID NO:6 is the protein sequence of a type I polyketide synthase (EPOS D) 
encoded by epoD (nucleotides 43524-54920 of SEQ ID NO:1). 

SEQ ID NO:7 is the protein sequence of a type I polyketide synthase (EPOS E) 
encoded by epoE (nucleotides 54935-62254 of SEQ ID NO:1). 

SEQ ID NO:8 is the protein sequence of a cytochrome P450 oxygenase homologue 
(EPOS F) encoded by epoF (nucleotides 62369-63628 of SEQ ID NO:1). 

SEQ ID NO:9 is a partial protein sequence (partial Orf 1) encoded by orf\ (nucleotides 
1-1826 of SEQ ID NO:1). 

SEQ ID NO: 10 is a protein sequence (Orf 2) encoded by ortZ (nucleotides 3171-1900 
on the reverse complement strand of SEQ ID NO:1). 

SEQ ID NO:11 is a protein sequence (Orf 3) encoded by ori3 (nucleotides 3415-5556 \ 
of SEQ IDNO:1). 

SEQ ID NO:12 is a protein sequence (Orf 4) encoded by orfA (nucleotides 5992-5612 
on the reverse complement strand of SEQ ID NO:1). 

SEQ ID NO:13 is a protein sequence (Orf 5) encoded by orfS (nucleotides 6226-6675 
of SEQ ID NO:1). 

SEQ ID NO:14 is a protein sequence (Orf 6) encoded by or/6 (nucleotides 63779- 
64333 of SEQ ID NO:1). 

SEQ ID NO:15 is a protein sequence (Orf 7) encoded by orff (nucleotides 64290- 
63853 on the reverse complement strand of SEQ ID NO:1). 

SEQ ID NO:16 is a protein sequence (Orf 8) encoded by orfe (nucleotides 64363- 
64920 of SEQ ID NO:1). 

SEQ ID NO:17 is a protein sequence (Orf 9) encoded by or/9 (nucleotides 64727- 
64287 on the reverse complement strand of SEQ ID NO:1). 

SEQ ID NO: 18 is a protein sequence (Orf 10) encoded by orftQ (nucleotides 65063- 
65767 of SEQIDNO:1). 

SEQ ID NO:19 is a protein sequence (Orf 11) encoded by ort\ 1 (nucleotides 65874- 
65008 on the reverse complement strand of SEQ ID NO:1). 

SEQ ID NO:20 is a protein sequence (Orf 12) encoded by orf\2 (nucleotides 66338- 
65871 on the reverse complement strand of SEQ ID NO:1). 
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SEQ ID NO:21 is a protein sequence (Orf 13) encoded by ort\3 (nucleotides 66667- 
67137 of SEQ IDN0:1). 

SEQ ID NO:22 is a protein sequence (Orf 14) encoded by orf\A (nucleotides 67334- 
68251 of SEQIDNO:1). 

SEQ ID NO:23 is a partial protein sequence (partial Orf 15) encoded by orf15 
(nucleotides 68346-68750 of SEQ ID NO:1). 

SEQ ID NO:24 is the universal reverse PCR primer sequence. 

SEQ ID NO:25 is the universal forward PCR primer sequence. 

SEQ ID NO:26 is the NH24 end "B" PCR primer sequence. 

SEQ ID NO:27 is the NH2 end "A" PCR primer sequence. 

SEQ ID NO:28 is the NH2 end "B" PCR primer sequence. 

SEQ ID NO:29 is the pEPO!5-NH6 end "B" PCR primer sequence. 

SEQ ID NO:30 is the pEP015-H2.7 end "A" PCR primer sequence. 

DEPOSIT INFORMATION ; 

The following material has been deposited with the Agricultural Research Service, 
Patent Culture Collection (NRRL), 1815 North University Street, Peoria, Illinois 61604, under 
the Budapest Treaty on the International Recognition of the Deposit of Microorganisms for 
the Purposes of Patent Procedure. All restrictions on the availability of the deposited 
material will be irrevocably removed upon the granting of a patent. 

Deposited Material Accession Number Deposit Date 

pEPOl 5 NRRL B-30033 June 1 1 , 1 998 

pEP032 NRRL B-301 1 9 April 1 6, 1 999 

DETAILED DESCRIPTION OF THE INVENTION 

The genes involved in the biosynthesis of epothilones can be isolated using the 
techniques according to the present invention. The preferable procedure for the isolation of 
epothilone biosynthesis genes requires the isolation of genomic DNA from an organism 
identified as producing epothilones A and B, and the transfer of the isolated DNA on a 
suitable plasmid or vector to a host organism that does not normally produce the polyketide, 
followed by the identification of transformed host colonies to which the epothilone-producing 
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ability has been conferred. Using a technique such as A,::Tn5 transposon mutagenesis (de 
Bruijn & Lupski, Gene 27: 131-149 (1984)), the exact region of the transforming epothilone- 
conferring DNA can be more precisely defined. Alternatively or additionally, the transforming 
epothilone-conferring DNA can be cleaved into smaller fragments and the smallest that 
maintains the epothilone-conferring ability further characterized. Whereas the host organism 
lacking the ability to produce epothilone may be a different species from the organism from 
which the polyketide derives, a variation of this technique involves the transformation of host 
DNA into the same host that has had its epothilone-producing ability disrupted by 
mutagenesis. In this method, an epothilone-producing organism is mutated and non- 
epothilone-producing mutants are isolated. These are then complemented by genomic DNA 
isolated from the epothilone-producing parent strain. 

A further example of a technique that can be used to isolate genes required for epo- 
thilone biosynthesis is the use of transposon mutagenesis to generate mutants of an epothi- 
lone-producing organism that, after mutagenesis, fails to produce the polyketide. Thus, the 
region of the host genome responsible for epothilone production is tagged by the transposon 
and can be recovered and used as a probe to isolate the native genes from the parent strain. 
PKS genes that are required for the synthesis of polyketides and that are similar to known 
PKS genes may be isolated by virtue of their sequence homology to the biosynthetic genes 
for which the sequence is known, such as those for the biosynthesis of rifamycin or 
soraphen. Techniques suitable for isolation by homology include standard library screening 
by DNA hybridization. 

Preferred for use as a probe molecule is a DNA fragment that is obtainable from a 
gene or another DNA sequence that plays a part in the synthesis of a known polyketide. A 
preferred probe molecule comprises a 1 .2 kb Sma\ DNA fragment encoding the ketosyntha- 
se domain of the fourth module of the soraphen PKS (U.S. Patent No. 5,716,849), and a 
more preferred probe molecule comprises the p-ketoacyl synthase domains from the first 
and second modules of the rifamycin PKS (Schupp et al % FEMS Microbiology Letters 159: 
201-207 (1998)). These can be used to probe a gene library of an epothilone-producing 
microorganism to isolate the PKS genes responsible for epothilone biosynthesis. 

Despite the well-known difficulties with PKS gene isolation in general and despite the 
difficulties expected to be encountered with the isolation of epothilone biosynthesis genes in 
particular, by using the methods described in the instant specification, biosynthetic genes for 
epothilones A and B can surprisingly be cloned from a microorganism that produces that 
polyketide. Using the methods of gene manipulation and recombinant production described 
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in this specification, the cloned PKS genes can be modified and expressed in transgenic 
host organisms. 

The isolated epothilone biosynthetic genes can be expressed in heterologous hosts 
to enable the production of the polyketide with greater efficiency than might be possible from 
native hosts. Techniques for these genetic manipulations are specific for the different 
available hosts and are known in the art. For example, heterologous genes can be expres- 
sed in Streptomyces and other actinomycetes using techniques such as those described in 
McDaniel et al., Science 262: 1546-1550 (1993) and Kao etal., Science 265: 509-512 
(1994), both of which are incorporated herein by reference. See a/so, Rowe etaL, Gene 
216: 215-223 (1998); Holmes etaL, EMBO Journal ^ 2(8): 3183-3191 (1993) and Bibb etaL, 
Gene 38: 215-226 (1985), all of which are incorporated herein by reference. 

Alternately, genes responsible for polyketide biosynthesis, i.e., epothilone biosynthe- 
tic genes, can also be expressed in other host organisms such as pseudomonads and E. 
colL Techniques for these genetic manipulations are specific for the different available hosts 
and are known in the art. For example, PKS genes have been sucessfully expressed in E. \ 
4 coli using the pT7-7 vector, which uses the T7 promoter. See, Tabor et aL, Proc. NatL Acad. 
ScL USA 82: 1074-1078 (1985), incorporated herein by reference. In addition, the 
expression vectors pKK223-3 and pKK223-2 can be used to express heterologous genes in 
E. coli t either in transcriptional or translational fusion, behind the tac or trc promoter. For the 
expression of operons encoding multiple ORFs, the simplest procedure is to insert the 
operon into a vector such as pKK223-3 in transcriptional fusion, allowing the cognate ribo- 
some binding site of the heterologous genes to be used. Techniques for overexpression in 
gram-positive species such as Bacillus are also known in the art and can be used in the 
context of this invention (Quax et aL, in: Industrial Microorganisms: Basic and Applied Mo- 
lecular Genetics, Eds. Baltz etaL, American Society for Microbiology, Washington (1993)). 

Other expression systems that may be used with the epothilone biosynthetic genes of 
the invention include yeast and baculovirus expression systems. See, for example, 'The 
Expression of Recombinant Proteins in Yeasts," Sudbery, P. E., Curr. Opin. Biotechnol. 7(5): 
517-524 (1996); "Methods for Expressing Recombinant Proteins in Yeast," Mackay, et al., 
Editor(s): Carey, Paul R., Protein Eng. Des. 105-153, Publisher: Academic, San Diego, Calif 
(1996); "Expression of heterologous gene products in yeast," Pichuantes, et al., Editor(s): 
Cleland, J. L, Craik, C. S., Protein Eng. 129-161, Publisher. Wiley-Liss, New York, N. Y 
(1996); WO 98/27203; Kealey etaL, Proc. Natl. Acad. ScL USA 95: 505-509 (1998); "Insect 
Cell Culture: Recent Advances, Bioengineering Challenges And Implications In Protein 
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Production," Palomares, et aL, Editor(s): Galindo, Enrique; Ramirez, Octavio T., Adv. 
Bioprocess Eng. Vol. II, Invited Pap. Int. Symp., 2nd (1998) 25-52, Publisher: Kluwer, 
Dordrecht, Neth; "Baculovirus Expression Vectors," Jarvis, Donald L, Editor(s): Miller, Lois 
K., Baculoviruses 389-431, Publisher: Plenum, New York, N. Y. (1997); "Production Of He- 
terologous Proteins Using The Baculovirus/lnsect Expression System," Grittiths, et al., Me- 
thods Mol. Biol. (Totowa, N. J.) 75 (Basic Cell Culture Protocols (2nd Edition)) 427-440 
(1997); and "Insect Cell Expression Technology," Luckow, Verne A., Protein Eng. 183-218, 
Publisher: Wiley-Liss, New York, N. Y. (1996); all of which are incorporated herein by refe- 
rence. 

Another consideration for expression of PKS genes in heterologous hosts is the re- 
quirement of enzymes for posttranslational modification of PKS enzymes by phosphopante- 
theinylation before they can synthesize polyketides. However, the enzymes responsible for 
this modification of type I PKS enzymes, phosphopantetheinyl (P-pant) transferases are not 
normally present in many hosts such as E. colL This problem can be solved by coexpres- 
sion of a P-pant transferase with the PKS genes in the heterologous host, as described by 
Kealey et al., Proa Natl. Acad. Sci. USA 95: 505-509 (1 998), incorporated herein by re- 
ference. 

Therefore, for the purposes of polyketide production, the significant criteria in the 
choice of host organism are its ease of manipulation, rapidity of growth (i.e. fermentation), 
possession or the proper molecular machinery for processes such as posttranslational 
modification, and its lack of susceptibility to the polyketide being overproduced. Most 
preferred host organisms are actinomycetes such as strains of Streptomyces. Other pre- 
ferred host organisms are pseudomonads and E. coll. The above-described methods of 
polyketide production have significant advantages over the technology currently used in the 
preparation of the compounds. These advantages include the cheaper cost of production, 
the ability to produce greater quantities of the compounds, and the ability to produce com- 
pounds of a preferred biological enantiomer, as opposed to racemic mixtures inevitably ge- 
nerated by organic synthesis. Compounds produced by heterologous hosts can be used in 
medical (e.g. cancer treatment in the case of epothilones) as well as agricultural applica- 
tions. 
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EXPERIMENTAL 

The invention will be further described by reference to the following detailed 
examples. These examples are provided for purposes of illustration only, and are not inten- 
ded to be limiting unless otherwise specified. Standard recombinant DNA and molecular 
cloning techniques used here are well known in the art and are described by Ausubel (ed.), 
Current Protocols in Molecular Biology, John Wiley and Sons, Inc. (1994); T. Maniatis, E. F. 
Fritsch and J. Sambrook, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor 
laboratory, Cold Spring Harbor, NY (1989); and by T.J. Silhavy, M.L. Berman, and LW. 
Enquist, Experiments with Gene Fusions, Cold Spring Harbor Laboratory, Cold Spring 
Harbor, NY (1984). 

Example 1 : Cultivation of an Epothilone-Producing Strain of Sorangium cellulosum 

Sorangium cellulosum strain 90 (DSM 6773, Deutsche Sammlung von Mikroorganis-' 
men und Zellkulturen, Braunschweig) is streaked out and grown (30°C) on an agar plate of 
SolE medium (0.35% glucose, 0.05% tryptone, 0.15% MgS0 4 x 7H 2 O f 0.05% ammonium 
sulfate, 0.1% CaCI 2 , 0.006% K 2 HP0 4 , 0.01% sodium dithionite, 0.0008% Fe-EDTA, 1.2% 
HEPES, 3.5% [vol/vol] supernatant of sterilized stationary S. cellulosum culture) pH ad. 7.4. 
Cells from about 1 square cm are picked and inoculated into 5 mis of G51t liquid medium 
(0.2% glucose, 0.5% starch, 0.2% tryptone, 0.1% probion S, 0.05% CaCI 2 x2H 2 0, 0.05% 
MgSO4x7H 2 0, 1.2% HEPES, pH ad. 7.4) and incubated at 30°C with shaking at 225 rpm. 
After 4 days, the culture is transferred into 50 mis of G51t and incubated as above for 5 
days. This culture is used to inoculate 500 mis of G51t and incubated as abqve for 6 days. 
The culture is centrifuged for 10 minutes at 4000 rpm and the cell pellet is resuspended in 50 
mis of G51t. 

Example 2: Generation of a Bacterial Artificial Chromosome (Bac) Library 

To generate a Bac library, S. cellulosum cells cultivated as described in Example 1 
above are embedded into agarose blocks, lysed, and the liberated genomic DNA is partially 
digested by the restriction enzyme HindlW. The digested DNA is separated on an agarose 
gel by pulsed-field electrophoresis. Large (approximately 90-150 kb) DNA fragments are 
isolated from the agarose gel and ligated into the vector pBelobacll. pBelobacIl contains a 
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gene encoding chloramphenicol resistance, a multiple cloning site in the lacZ gene providing 
for blue/white selection on appropriate medium, as well as the genes required for the 
replication and maintenance of the plasmid at one or two copies per cell. The ligation mix- 
ture is used to transform Escherichia coli DH10B electrocompetent cells using standard 
electroporation techniques. Chloramphenicol-resistant recombinant (white, lacZ mutant) 
colonies are transferred to a positively charged nylon membrane filter in 384 3X3 grid format. 
The clones are lysed and the DNA is cross-linked to the filters. The same clones are also 
preserved as liquid cultures at -80°C. 

Example 3: Screening the Bac Library of Sorangium cellulosum 90 for the Presence of Type 

I Polyketide Synthase- Related Sequences 

The Bac library filters are probed by standard Southern hybridization procedures. 
The DNA probes used encode p-ketoacyl synthase domains from the first and second 
modules of the rifamycin polyketide synthase (Schupp et aL, FEMS Microbiology Letters \ 
159: 201-207 (1998)). The probe DNAs are generated by PCR with primers flanking each 
ketosynthase domain using the plasmid pNE95 as the template (pNE95 equals cosmid 2 
described in Schupp et al. (1998)). 25 ng of PCR-amplified DNA is isolated from a 0.5% 
agarose gel and labeled with 32 P-dCTP using a random primer labeling kit (Gibco-BRL, 
Bethesda MD, USA) according to the manufacturer's instructions. Hybridization is at 65°C 
for 36 hours and membranes are washed at high stringency (3 times with 0.1 x SSC and 
0.5% SDS for 20 min at 65°C). The labeled blot is exposed on a phosphorescent screen 
and the signals are detected on a Phospholmager 445SI (screen and 445S1 from Molecular 
Dynamics). This results in strong hybridization of certain Bac clones to the probes. These 
clones are selected and cultured overnight in 5 mis of Luria broth. (LB) at 37°C. Bac DNA 
from the Bac clones of interest is isolated by a typical miniprep procedure. The cells are 
resuspended in 200 jil lysozyme solution (50mM glucose, 10 mM EDTA, 25 mM Tris-HCI, 
5mg/ml lysozyme), lysed in 400 \x\ lysis solution (0.2 N NaOH and 2% SDS), the proteins are 
precipitated (3:0 M potassium acetate, adjusted to pH5.2 with acetic acid), and the Bac DNA 
is precipitated with isopropanol. The DNA is resuspended in 20jil of nuciease-free distilled 
water, restricted with BamHl (New England Biolabs, Inc.) and separated on a 0.7% agarose 
gel. The gel is blotted by Southern hybridization as described above and probed under 
conditions described above, with a 1.2 kb Sma\ DNA fragment encoding the ketosynthase 
domain of the fourth module of the soraphen polyketide synthase as the probe (see, U.S. 
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Patent No. 5,716,849). Five different hybridization patterns are observed. One clone 
representing each of the five patterns is selected and named pEP015, pEPO20, pEPO30, 
pEP031 , and pEP033, respectively. 

Example 4: Subcloning of BamH\ Fragments from pEP015, pEPO20, pEPO30, pEP031, 

and pEP033 

The DNA of the five selected Bac clones is digested with BamHl and random frag- 
ments are subcloned into pBIuescript II SK+ (Stratagene) at the BamH\ site. Subclones car- 
rying inserts between 2 and 10 kb in size are selected for sequencing of the flanking ends of 
a the inserts and also probed with the 1 .2 Smal probe as described above. Subclones that 
i show a high degree of sequence homology to known polyketide synthases and/or strong 

* hybridization to the soraphen ketosynthase domain are used for gene disruption experi- 
j ments. 

J 

Example 5: Preparation of Streptomycin-Resistant Spontaneous Mutants of Sorangium 

* cellulosum strain So ce90 

0.1 ml of a three day old culture of Sorangium cellulosum strain So ce90, which is 
I raised in liquid medium G52-H (0.2% yeast extract, 0.2% soyameal defatted, 0.8% potato 
starch, 0.2% glucose, 0.1% MgS04 x7H20, 0.1% CaCI2 x2H20, 0.008% Fe-EDTA, pH ad 
7.4 with KOH), is plated out on agar plates with SolE medium supplemented with 100 \ig/vc\\ 
streptomycin. The plates are incubated at 30°C for 2 weeks. The colonies growing on this 
medium are streptomycin-resistant mutants, which are streaked out and cultivated once 
more on the same agar medium with streptomycin for purification. One of these strepto- 
mycin-resistant mutants is selected and is called BCE28/2. 
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Example 6: Gene Disruptions in Sorangium cellulosum BCE28/2 Using the Subcloned 

BamHi Fragments 



The BamH\ inserts of the subclones generated from the five selected Bac clones as 
described above are isolated and ligated into the unique BamH\ site of plasmid pCIB132 
(see, U.S. Patent No. 5,716,849). The pCIB132 derivatives carrying the inserts are trans- 
formed into Escherichia coli ED8767 containing the helper plasmid pUZ8 (Hedges and 
Matthew, Plasmid 2: 269-278 (1979). The transformants are used as donors in conjugation 
experiments with Sorangium cellulosum BCE28/2 as recipient. For the conjugation, 5-10 x 
10 9 cells of Sorangium cellulosum BCE28/2 from an early stationary phase culture (reaching 
about 5 x 10 8 cells/ml) grown at 30°C in liquid medium G51b (G51b equals medium G51t 
with tryptone replaced by peptone) are mixed in a 1:1 cellular ratio with a late-log phase 
culture (in LB liquid medium) of E. coli ED8767 containing pCIB132 derivatives carrying the 
subcloned BamH\ fragments and the helper plasmid pUZ8. The mixed cells are then centri- 
fuged at 4000 rpm for 10 minutes and resuspended in 0.5 ml G51b medium. This cell sus- ; 
pension is then plated as a drop in the center of a plate with So1 E agar containg 50 mg/l 
kanamycin. The cells obtained after incubation for 24 hours at 30°C are harvested and res- 
uspended in 0.8 ml of G51b medium, and 0.1 to 0.3 ml of this suspension is plated out on a 
selective So1E solid medium containing phleomycin (30 mg/l), streptomycin (300 mg/l), and 
kanamycin (50 mg/l). The counterselection of the donor Escherichia coli strain takes place 
with the aid of streptomycin. The colonies that grow on this selective medium after an in- 
cubation time of 8-12 days at a temperature of 30°C are isolated with a plastic loop and 
streaked out and cultivated on the same agar medium for a second round of selection and 
purification. The colony-derived cultures that grow on this selective agar medium after 7 
days at a temperature of 30°C are transconjugants of Sorangium cellulosum BCE28/2 that 
have acquired phleomycin resistance by conjugative transfer of the pCIB132 derivatives 
carrying the subcloned BamHl fragments. 

Integration of the pCIB132-derived plasmids into the chromosome of Sorangium 
cellulosum BCE28/2 by homologous recombination is verified by Southern hybridization. For 
this experiment, complete DNAfrom 5-10 tranconjugants per transferred BamHl fragment is 
isolated (from 10 ml cultures grown in medium G52-H for three days) applying the method 
described by Pospiech and Neumann, Trends Genet 11: 217 (1995). For the Southern blot, 
the DNA isolated as described above is cleaved either with the restriction enzymes BgM, 
C/al, or Afofl, and the respective BamH\ inserts or pCIB132 are used as 32P labelled probes. 
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Example 7: Analysis of the Effect of the Integrated BamHl Fragments on Epothilone 
Production by Sorangium cellulosum After Gene Disruption 

Transconjugant cells grown on about 1 square cm surface of the selective So1E 
plates of the second round of selection (see Example 6) are transferred by a sterile plastic 
loop into 10 ml of medium G52-H in an 50 ml Erlenmeyer flask. After incubation at 30°C and 
180 rpm for 3 days, the culture is transfered into 50 ml of medium G52-H in an 200 ml 
Erlenmeyer flask. After incubation at 30°C and 180 rpm for 4-5 days, 10 ml of this culture is 
transfered into 50 ml of medium 23B3 (0.2 % glucose, 2 % potato starch, 1.6 % soya meal 
defatted, 0.0008 % Fe-EDTA Sodium salt, 0.5 % HEPES (4-(2-hydroxyethyl)-piperazine-1- 
ethane-sulfonic-acid), 2 % vol/vol polysterole resin XAD1 6 (Rohm & Haas), pH adjusted to 
7.8 with NaOH) in an 200 ml Erlenmeyer flask. 

Quantitative determination of the epothilone produced takes place after incubation of 
the cultures at 30°C and 180 rpm for 7 days. The complete culture broth is filtered by \ 
suction through a 150 |im nylon filter. The resin remaining on the filter is then resuspended 
in 10 ml isopropanol and extracted by shaking the suspension at 180 rpm for 1 hour. 1 ml is 
removed from this suspension and centrifuged at 12,000 rpm in an Eppendorff Microfuge. 
ij The amount of epothilones A and B therein is determined by means of an HPLC and 

detection at 250 nm with a UVJDAD detector (HPLC with Waters -Symetry C18 column and 
a gradient of 0.02 % phosphoric acid 60%-0% and acetonitril 40%-100%). 

Transconjugants with three different integrated BamHl fragments subcloned from 
pEP015, namely transconjugants with the BamHl fragment of plasmid pEP015-21, trans- 
conjugants with the BamHl fragment of plasmid pEPOl 5-4-5, and transconjuciants with the 
BamHl fragment of plasmid pEPOl 5-4-1, are tested in the manner described above. HPLC 
analysis reveals that all transconjugants no longer produce epothilone A or B. By contrast, 
epothilone A and B are detectable in a concentration of 2-4 mg/l in transconjugants with 
BamHl fragments integrated that are derived from pEPO20, pEPO30, pEP031, pEP033, 
and in the parental strain BCE28/2. 
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Example 8: Nucleotide Sequence Determination of the Cloned Fragments and 

Construction of Contigs 

A. SamHI Insert of Plasmid pEP015-21 

Plasmid DNA is isolated from the strain Escherichia co/ZDHlOB [pEP015-21], and 
the nucleotide sequence of the 2.3-kb BamHI insert in pEP015-21 is determined. Automa- 
ted DNA sequencing is done on the double-stranded DNA template by the dideoxynucleotide 
chain termination method, using Applied Biosystems model 377 sequencers. The primers 
used are the universal reverse primer (5' GGA AAC AGC TAT GAC CAT G 3' (SEQ ID 
NO:24)) and the universal forward primer (5' GTA AAA CGA CGG CCA GT 3' (SEQ ID 
NO:25)). In subsequent rounds of sequencing reactions, custom-synthesized oligonucle- 
otides, designed for the 3' ends of the previously determined sequences, are used to extend 
and join contigs. Both strands are entirely sequenced, and every nucleotide is sequenced at 
least two times. The nucleotide sequence is compiled using the program Sequencher vers. 
3.0 (Gene Codes Corporation), and analyzed using the University of Wisconsin Genetics \ 
Computer Group programs. The nucleotide sequence of the 2213-bp insert corresponds to 
nucleotides 20779-22991 of SEQ ID NO:1. 

B. BamH\ Insert of Plasmid pEPOl 5-4-1 

Plasmid DNA is isolated from the strain Escherichia coli DH10B [pEPOl 5-4-1], and 
the nucleotide sequence of the 3.9-kb BamH\ insert in pEPOl 5-4-1 is determined as descri- 
bed in (A) above. The nucleotide sequence of the 3909-bp insert corresponds to nucleotides 
1 6876-20784 of SEQ ID NO:1 . 

C. BamHl Insert of Plasmid pEPOl 5-4-5 

Plasmid DNA is isolated from the strain Escherichia coli DH1QB [pEPOl 5-4-5], and 
the nucleotide sequence of the 2.3-kb SamHI insert in pEPOl 5-4-5 is determined as 
described in (A) above. The nucleotide sequence of the 2233-bp insert corresponds to 
nucleotides 42528-44760 of SEQ ID NO:1. 
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Example 9: Subcloning and Ordering of DNA Fragments from pEP015 Containing 
Epothilone Biosynthesis Genes 

pEP015 is digested to completion with the restriction enzyme H/ndlll and the resul- 
ting fragments are subcloned into pBluescript II SK- or pNEB193 (New England Biolabs) that 
has been cut with HindWl and dephosphorylated with calf intestinal alkaline phosphatase. Six 
different clones are generated and named pEP015-NH1, pEP015-NH2, pEP015-NH6, 
pEP015-NH24 (all based on pNEB193), and pEP015-H2.7 and pEPO15-H3.0 (both based 
on pBluescript II SK-). 

The BamH\ insert of pEPO!5-21 is isolated and DIG-labeled (Non-radioactive DNA 
labeling and detection system, Boehringer Mannheim), and used as a probe in DNA hybri- 
dization experiments at high stringency against pEP015-NH1, pEP015-NH2, pEP015-NH6, 
pEP015-NH24, pEP015-H2.7 and pEPO15-H3.0. Strong hybridization signal is detected for 
pEP015-NH24, indicating that pEP015-21 is contained within pEP015-NH24. 

The BamH\ insert of pEPOl 5-4-1 is isolated and DIG-labeled as above, and used as 
a probe in DNA hybridization experiments at high stringency against pEP015-NH1, 
pEP015-NH2, pEP015-NH6, pEP015-NH24, pEP015-H2.7 and pEPO15-H3.0. Strong 
hybridization signals are detected for pEP015-NH24 and pEP015-H2.7. Nucleotide se- 
quence data generated from one end each of pEP015-NH24 and pEP015-H2.7 are also in 
complete agreement with the previously determined sequence of the SamHI insert of 
pEPOl 5-4-1. These experiments demonstrate that pEPOl 5-4-1 (which contains one inter- 
nal H/ndlll site) overlaps pEP015-H2.7 and pEP015-NH24, and that pEP015-H2.7 and 
pEP015-NH24, in this order,,are contiguous. 

The SamHI insert of pEPOl 5-4-5 is isolated and DIG-labeled as above, and used as 
a probe in DNA hybridization experiments at high stringency against pEP015-NH1, 
pEP015-NH2, pEP015-NH6, pEP015-NH24, pEP015-H2.7 and pEPO15-H3.0. Strong 
hybridization signal is detected for pEP015-NH2, indicating that pEP015-21 is contained 
within pEP015-NH2. 

Nucleotide sequence data is generated from both ends of pEP015-NH2 and from the 
end of pEP015-NH24 that does not overlap with pEPOl 5-4-1 . PCR primers NH24 end "B M : 
GTGACTGGCGCCTGGAATCTGCATGAGC (SEQ ID NO:26), NH2 end "A": 
AG CG G GAG CTTG CTAG AC ATTCTGTTTC (SEQ ID NO:27), and NH2 end "B": 
GACGCGCCTCGGGCAGCGCCCCAA (SEQ ID NO:28), pointing towards the HindWl sites, 
are designed based on these sequences and used in amplification reactions with pEP015 
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and, in separate experiments, with Sorangium cellulosum So ce90 genomic DNA as the 
templates. Specific amplification is found with primer pair NH24 end "B M and NH2 end "A" 
with both templates. The amplimers are cloned into pBluescript II SK- and completely se- 
quenced. The sequences of the amplimers are identical, and also agree completely with the 
end sequences of pEP015-NH24 and pEP015-NH2, fused at the H/ndlll site, establishing 
that the H/ndlll fragments of pEP015-NH2 and pEP015-NH24 are, in this order, contiguous. 

The H/ndlll insert of pEP015-H2.7 is isolated and DIG-labeled as above, and used as 
a probe in a DNA hybridization experiment at high stringency against pEPOIS digested by 
Ato/i. A Afofl fragment of about 9 kb in size shows a strong a hybridization, and is further 
subcloned into pBluescript II SK- that has been digested with Afofl and dephosphorylated 
with calf intestinal alkaline phosphatase, to yield pEP015-N9-16. The A/o/l insert of pEPOIS- 
N9-16 is isolated and DIG-labeled as above, and used as a probe in DNA hybridization 
experiments at high stringency against pEP015-NH1, pEP015-NH2, pEP015-NH6, 
pEP015-NH24, pEP015-H2.7 and pEPO15-H3.0. Strong hybridization signals are detected 
for pEP015-NH6, and also for the expected clones pEP015-H2.7 and pEP015-NH24. ; 
Nucleotide sequence data is generated from both ends of pEP015-NH6 and from the end of 
pEP015-H2.7 that does not overlap with pEPOl 5-4-1 . PCR primers are designed pointing 
towards the H/ndlll sites and used in amplification reactions with pEP015 and, in separate 
experiments, with Sorangium cellulosum So ce90 genomic DNA as the templates. Specific 
amplification is found with primer pair pEP015-NH6 end "B w : 

CACCGAAGCGTCGATCTGGTCCATC (SEQ ID NO:29) and pEP015-H2.7 end "A": 
C G GTC AG ATCG ACG ACG G GCTTTCC (SEQ ID NO:30) with both templates. The ampli- 
mers are cloned into pBluescript II SK- and completely sequenced. The sequences of the 
amplimers are identical, and also agree completely with the end sequences of pEP015-NH6 
and pEP015-H2.7, fused at the H/ndlll site, establishing that the H/ndlll fragments of 
pEP015-NH6 and pEP015-H2.7 are, in this order, contiguous. 

All of these experiments, taken together, establish a contig of H/ndlll fragments 
covering a region of about 55 kb and consisting of the H/ndlll inserts of pEP015-NH6, 
pEP015-H2.7, pEP015-NH24, and pEP015-NH2, in this order. The inserts of the re- 
maining two H/ndlll subclones, namely pEP015-NH1 and pEPO15-H3.0, are not found to be 
parts of this contig. 
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Example 10: Further Extension of the Subclone Contig Covering the Epothilone Biosynthesis 

Genes 

An approximately 2.2 kb BamH\ - H/ndlll fragment derived from the downstream end 
of the insert of pEP015-NH2 and thus representing the downstream end of the subclone 
contig described in Example 9 is isolated, DIG-labeled, and used in Southern hybridization 
experiments against pEP015 and pEP015-NH2 DNAs digested with several enzymes. The 
strongly hybridizing bands are always found to be the same in size between the two target 
DNAs indicating that the Sorangium cellulosum So ce90 genomic DNA fragment cloned into 
pEP015 ends with the H/ndlll site at the downstream end of pEP015-NH2. 

A cosmid DNA library of Sorangium cellulosum So ce90 is generated, using establi- 
shed procedures, in pScosTriplex-ll (Ji, et al., Genomics 31: 185-192 (1996)). Briefly, high- 
molecular weight genomic DNA of Sorangium cellulosum So ce90 is partially digested with 
the restriction enzyme Sau3A\ to provide fragments with average sizes of about 40 kb, and 
ligated to BamH\ and Xba\ digested pScosTriplex-ll. The ligation mix is packaged with \ 
Gigapack III XL (Stratagene) and used to transfect E. coli XL1 Blue MR cells. 

The cosmid library is screened with the approximately 2.2 kb BamHl - H/ndlll frag- 
ment, derived from the downstream end of the insert of pEP015-NH2, used as a probe in 
colony hybridization. A strongly hybridizing clone, named pEP04E7 is selected. 

pEP04E7 DNA is isolated, digested with several restriction endonucleases, and 
probed in Southern hybridization experiments with the 2.2 kb BamHl - H/ndlll fragment. A 
strongly hybridizing Noti fragment of approximately 9 kb in size is selected and subcloned 
into pBluescript II SK- to yield pEP04E7-N9-8. Further Southern hybridization experiments 
reveal that the approximately 9 kb Noti insert of pEP04E7-N9-8 overlaps pEP015-NH2 over 
6 kb in a Noil - H/ndlll fragment, while the remaining approximately 3 kb H/ndlll - Noti 
fragment would extend the subclone contig described in Example 9. End sequencing re- 
veals, however, that the downstream end of the insert of pEPQ4E7-N9-8 contains the 
BamHl - Noti polylinker of pScosTriplex-ll, thereby indicating that the genomic DNA insert of 
pEP04E7 ends at a Sau3Al site within the extending H/ndlll - Noti fragment and that the 
Noti site is derived from pScosTriplex-ll. 

An approximately 1 .6 kb Psti - SaH fragment derived from the approximately 3 kb 
extending HindUl - Noti subfragment of pEP04E7-N9-8, containing only Sorangium 
cellulosum So ce90-derived sequences free of vector, is used as a probe against the 
bacterial artificial chromosome library described in Example 2. Besides the previously- 
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isolated EP015, a Bac clone, named EP032, is found to strongly hybridize to the probe. 
pEP032 is isolated, digested with several restriction endonucleases, and hybridized with the 
approximately 1 .6 kb Ps/i - Sa/I probe. A H/ndlll - EcoRV fragment of about 13 kb in size is 
found to strongly hybridize to the probe, and is subcloned into pBluescript II SK- digested 
with H/ndlll and HincU to yield pEP032-HEV15. 

Oligonucleotide primers are designed based on the downstream end sequence of 
pEP015-NH2 and on the upstream (H/ndlll) end sequence derived from pEP032-HEV15, 
and used in sequencing reactions with pEP04E7-N9-8 as the template. The sequences 
reveal the existence of a small H/ndlll fragment (EPO4E7-H0.02) of 24 bp, undetectable in 
standard restriction analysis, separating the H/ndlll site at the downstream end of pEPOIS- 
NH2 from the H/ndlll site at the upstream end of pEP032-HEV15. 

Thus, the subclone contig described in Example 9 is extended to include the H/ndlll 
fragment EPO4E7-H0.02 and the insert of pEP032-HEV15, and constitutes the inserts of: 
pEP015-NH6, pEP015-H2.7, pEP015-NH24, pEP015-NH2, EPO4E7-H0.02 and pEP032- 
HEV15, in this order. 

Example 1 1 : Nucleotide Sequence Determination of the Subclone Contig Covering the 

Epothilone Biosynthesis Genes 

The nucleotide sequence of the subclone contig described in Example 10 is 
determined as follows. 

pEP015-H2.7. Plasmid DNA is isolated from the strain Escherichia coli DH10B 
[pEP015-H2.7], and the nucleotide sequence of the 2.7-kb BamH\ insert in pEPQ15-H2.7 is 
determined. Automated DNA sequencing is done on the double-stranded DNA template by 
the dideoxynucleotide chain termination method, using Applied Biosystems model 377 
sequencers. The primers used are the universal reverse primer (5' GGA AAC AGC TAT 
GAC CAT G 3' (SEQ ID NO:24)) and the universal forward primer (5' GTA AAA CGA CGG 
CCA GT 3' (SEQ ID NO:25)). In subsequent rounds of sequencing reactions, custom- 
synthesized oligonucleotides, designed for the 3' ends of the previously determined 
sequences, are used to extend and join contigs. 

pEP015-NH6, pEPQ15-NH24 and pEP015-NH2. The H/ndlll inserts of these plas- 
mids are isolated, and subjected to random fragmentation using a Hydroshear apparatus 
(Genomic Instrumentation Services, Inc.) to yield an average fragment size of 1-2 kb. The 
fragments are end-repaired using T4 DNA Polymerase and Klenow DNA Polymerase en- 
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zymes in the presence of desoxynucleotide triphosphates, and phosphorylated with T4 DNA 
Kinase in the presence of ribo-ATP. Fragments in the size range of 1 .5-2.2 kb are isolated 
from agarose gels, and ligated into pBluescript II SK- that has been cut with EcoRV and de- 
phosphorylated. Random subclones are sequenced using the universal reverse and the 
universal forward primers. 

pEP032-HEV15. pEP032-HEV15 is digested with H/ndlll and Sspl, the approxima- 
tely 13.3 kb fragment containing the -13 kb H/ndlll - EcoRV insert from So. cellulosum So 
ce90 and a 0.3 kb HincU - Ssp\ fragment from pBluescript II SK- is isolated, and partially 
digested with Haelll to yield fragments with an average size of 1-2 kb. Fragments in the size 
range of 1.5-2.2 kb are isolated from agarose gels, and ligated into pBluescript II SK- that 
has been cut with EcoRV and dephosphorylated. Random subclones are sequenced using 
the universal reverse and the universal forward primers. 

The chromatograms are analyzed and assembled into contigs with the Phred, Phrap 
and Consed programs (Ewing, etaL, Genome Res. 8(3): 175-185 (1998); Ewing, etal., 
Genome Res. 8(3): 186-194 (1998); Gordon, etal., Genome Res. 8(3): 195-202 (1998)). \ 
Contig gaps are filled, sequence discrepancies are resolved, and low-quality regions are 
resequenced using custom-designed oligonucleotide primers for sequencing on either the 
original subclones or selected clones from the random subclone libraries. Both strands are 
completely sequenced, and every basepair is covered with at least a minimum aggregated 
Phred score of 40 (confidence level of 99.99%). 

The nucleotide sequence of the 68750 bp contig is shown as SEQ ID NO:1. 
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Example 12: Nucleotide Sequence Analysis of the Epothilone Biosynthesis Genes 
SEQ ID NO:1 is found to contain 22 ORFs as detailed below in Table 1 : 



Table 1 





Start rodon 


Ston codon 


Homology of deduced protein 


Proposed function of deduced protein 


orfl 


outside of 
seq uenoe u 
range 


1826 








^171 


1900 


Hypothetical protein SP: Ql 1037; 
DD-peptidase SP:P15555 




orfh 


3415 


5556 


Na/H antiporter P1D: Dl 017724 


Transport 


orfA * 


5992 


5612 






orf5 


6226 


6675 






epoA 


.7610 


11875 


Type I polyketide synthase 


Epothilone synthase: Thiazole ring 
formation 


epoP 


11872 


16104 


Non-ribosomal peptide synthetase 


Epothilone synthase: Thiazole ring 
formation 


epoB 


16251 


21749 


Type I polyketide synthase 


Epothilone synthase: Polyketide 
backbone formation 


epoC 


21746 


43519 


Type I polyketide synthase 


Epothilone synthase: Polyketide V 
backbone formation 


epdD 


43524 


54920 


Type I polyketide synthase 


Epothilone synthase: Polyketide 
backbone formation 


epoB 


54935 


62254 


Type I polyketide synthase 


Epothilone synthase: Polyketide 
backbone formation 


epoF 


62369 


63628 


Cytochrome P450 


Epothilone macrolactone oxidase 


orfS 


63779 


64333 






orf7* 


64290 


63853 






orfS 


64363 


64920 






orf9 * 


64727 


64287 






orflO 


65063 


65767 






orfll * 


65874 


65008 






orfll* 


66338 


65871 






or/13 


66667 


67137 






or/14 


67334 


68251 


Hypothetical protein GL3293544; 
Cation efflux system protein 
GL2623026 


Transport 


or/15 


68346 


outside of 
sequenced 
range 







* On the reverse complementer strand. Numbering according to SEQ ID NO:1. 



epoA (nucleotides 7610-1 1875 of SEQ ID NO:1) codes for EPOS A (SEQ ID NO:2), a 
type I polyketide synthase consisting of a single module, and harboring the following do- 
mains: p-ketoacyl-synthase (KS) (nucleotides 7643-8920 of SEQ ID NO:1, amino acids 11- 
437 of SEQ ID NO:2); acyltransf erase (AT) (nucleotides 9236-10201 of SEQ ID NO:1, amino 
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acids 543-864 of SEQ ID NO:2); enoyl reductase (ER) (nucleotides 10529-11428 of SEQ ID 
NO:1, amino acids 974-1273 of SEQ ID NO:2); and acyl carrier protein homologous domain 
(ACP) (nucleotides 11549-11764 of SEQ ID NO:1, amino acids 1314-1385 of SEQ ID NO:2). 
Sequence comparisons and motif analysis (Haydock, et al. FEBS Lett. 374: 246-248 (1995); 
Tang, et al., Gene 216: 255-265 (1998)) reveal that the AT encoded by EPOS A is specific 
for malonyl-CoA. EPOS A should be involved in the initiation of epothilone biosynthesis by 
loading the acetate unit to the multienzyme complex that will eventually form part of the 2- 
methylthiazole ring (C26 and C20). 

epoP (nucleotides 11872-16104 of SEQ ID NO:1) codes for EPOS P (SEQ ID NO:3), a 
non-ribosomal peptide synthetase containing one module. EPOS P harbors the following 
domains: 

• peptide bond formation domain, as delineated by motif K (amino acids 72-81 
[FPLTDIQESY] of SEQ ID NO:3, corresponding to nucleotide positions 12085-12114 of 
SEQ ID NO:1); motif L (amino acids 118-125 [WARHDML] of SEQ ID NO:3, correspon- 1 
ding to nucleotide positions 12223-12246 of SEQ ID NO:1); motif M (amino acids 199-212 
[SIDLINVDLGSLSI] of SEQ ID NO:3, corresponding to nucleotide positions 12466-12507 
of SEQ ID NO:1); and motif O (amino acids 353-363 [GDFTSMVLLDI] of SEQ ID NO:3, 
corresponding to nucleotide positions 12928-12960 of SEQ ID NO:1); 

• aminoacyl adenylate formation domain, as delineated by motif A (amino acids 549-565 
[LTYEELSRRSRRLGARL] of SEQ ID NO:3, corresponding to nucleotide positions 13516- 
13566 of SEQ ID NO:1); motif B (amino acids 588-603 [VAVLAVLESGAAYVPI] of SEQ 
ID NO:3, corresponding to nucleotide positions 13633-13680 of SEQ ID NO:1); motif C 
(amino acids 669-684 [AYVIYTSGSTGLPKGV] of SEQ ID NO:3, corresponding to 
nucleotide positions 13876-13923 of SEQ ID NO:1); motif D (amino acids 815-821 
[SLGGATE] of SEQ ID NO:3, corresponding to nucleotide positions 14313-14334 of SEQ 
ID NO:1); motif E (amino acids 868-892 [GQLYIGG VG LALG YW RDEEKTRKSF] of SEQ 
ID NO:3, corresponding to nucleotide positions 14473-14547 of SEQ ID NO:1); motif F 
(amino acids 903-912 [YKTGDLGRYL] of SEQ ID NO:3, corresponding to nucleotide 
positions 14578-14607 of SEQ ID NO:1); motif G (amino acids 918-940 
[EFMGREDNQIKLRGYRVELGEIE] of SEQ ID NO:3, corresponding to nucleotide 
positions 14623-14692 of SEQ ID NO:1); motif H (amino acids 1268-1274 [LPEYMVP] of 
SEQ ID NO:3, corresponding to nucleotide positions 15673-15693 of SEQ ID NO:1); and 
motif I (amino acids 1285-1297 [LTSNGKVDRKALR] of SEQ ID NO:3, corresponding to 
nucleotide positions 15724-15762 of SEQ ID NO:1); 
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• an unknown domain, inserted between motifs G and H of the aminoacy! adenylate 
formation domain (amino acids 973-1256 of SEQ ID NO:3, corresponding to nucleotide 
positions 14788-15639 of SEQ ID NO:1); and 

• a peptidyl carrier protein homologous domain (PCP), delineated by motif J (amino 
acids 1344-1351 [GATSIHIV] of SEQ ID NO:3, corresponding to nucleotide positions 
15901-15924 of SEQ ID NO:1). 

It is proposed that EPOS P is involved in the activation of a cysteine by adenyiation, binding 
the activated cysteine as an aminoacyl-S-PCP, forming a peptide bond between the en- 
zyme-bound cysteine and the acetyl-S-ACP supplied by EPOS A, and the formation of the 
initial thiazoline ring by intramolecular heterocyclization. The unknown domain of EPOS P 
displays very weak homologies to NAD(P)H oxidases and reductases from Bacillus species. 
Thus, this unknown domain and/or the ER domain of EPOS A may be involved in the oxida- 
tion of the initial 2-methylthiazoline ring to a 2-methylthiazole. 

epoB (nucleotides 16251-21749 of SEQ ID NO:1) codes for EPOS B (SEQ ID NO:4), a 
type I polyketide synthase consisting of a single module, and harboring the following do- v 
mains: KS (nucleotides 16269-17546 of SEQ ID NO:1, amino acids 7-432 of SEQ ID NO:4); 
AT (nucleotides 17865-18827 of SEQ ID NO:1, amino acids 539-859 of SEQ ID NO:4); 
dehydratase (DH) (nucleotides 18855-19361 of SEQ ID NO:1, amino acids 869-1037 of SEQ 
ID NO:4); p-ketoreductase (KR) (nucleotides 20565-21302 of SEQ ID NO:1, amino acids 
1439-1684 of SEQ ID NO:4); and ACP (nucleotides 21414-21626 of SEQ ID NO:1, amino 
acids 1722-1792 of SEQ ID NO:4). Sequence comparisons and motif analysis reveal that 
the AT encoded by EPOS B is specific for methylmalonyl-CoA. EPOS A should be involved 
in the first polyketide chain extension by catalysing the Claisen-like condensation of the 2- 
methyl-4-thiazoIecarboxyl-S-PCP starter group with the methylmalonyl-S-ACp, and the 
concomitant reduction of the b-keto group of C17 to an enoyl. 

epoC (nucleotides 21746-43519 of SEQ ID NO:1) codes for EPOS C (SEQ ID NO:5), a 
type I polyketide synthase consisting of 4 modules. The first module harbors a KS (nucle- 
otides 21860-23116 of SEQ ID NO:1, amino acids 39-457 of SEQ ID NO:5); a maionyl CoA- 
specific AT (nucleotides 23431-24397 of SEQ ID NO:1, amino acids 563-884 of SEQ ID 
NO:5); a KR (nucleotides 25184-25942 of SEQ ID NO:1, amino acids 1147-1399 of SEQ ID 
NO:5); and an ACP (nucleotides 26045-26263 of SEQ ID NO:1, amino acids 1434-1506 of 
SEQ ID NO:5). This module incorporates an acetate extender unit (C14-C13) and reduces 
the p-keto group at C15 to the hydroxyl group that takes part in the final lactonization of the 
epothilone macrolactone ring. The second module of EPOS C harbors a KS (nucleotides 
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26318-27595 of SEQ ID N0:1, amino acids 1524-1950 of SEQ ID NO:5); a malonyl CoA- 
specific AT (nucleotides 27911-28876 of SEQ ID NO:1, amino acids 2056-2377 of SEQ ID 
NO:5); a KR (nucleotides 29678-30429 of SEQ ID NO:1 , amino acids 2645-2895 of SEQ ID 
NO:5); and an ACP (nucleotides 30539-30759 of SEQ ID NO:1, amino acids 2932-3005 of 
SEQ ID NO:5). This module incorporates an acetate extender unit (C12-C11) and reduces 
the p-keto group at C13 to a hydroxyl group. Thus, the nascent polyketide chain of epothi- 
lone corresponds to epothilone A, and the incorporation of the methyl side chain at C12 in 
epothilone B would require a post-PKS C-methyltransferase activity. The formation of the 
epoxi ring at C13-C12 would also require a post-PKS oxidation step. The third module of 
EPOS C harbors a KS (nucleotides 30815-32092 of SEQ ID NO:1, amino acids 3024-3449 
of SEQ ID NO:5); a malonyl CoA-specific AT (nucleotides 32408-33373 of SEQ ID NO:1 , 
amino acids 3555-3876 of SEQ ID NO:5); a DH (nucleotides 33401-33889 of SEQ ID NO:1, 
amino acids 3886-4048 of SEQ ID NO:5); an ER (nucleotides 35042-35902 of SEQ ID NO:1, 
amino acids 4433-4719 of SEQ ID NO:5); a KR (nucleotides 35930-36667 of SEQ ID NO:1, 
amino acids 4729-4974 of SEQ ID NO:5); and an ACP (nucleotides 36773-36991 of SEQ ID 
NO:1, amino acids 5010-5082 of SEQ ID NO:5). This module incorporates an acetate 
extender unit (C10-C9) and fully reduces the p-keto group at C1 1 . The fourth module of 
EPOS C harbors a KS (nucleotides 37052-38320 of SEQ ID NO:1, amino acids 5103-5525 
of SEQ ID NO:5); a methylmalonyl CoA-specific AT (nucleotides 38636-39598 of SEQ ID 
NO:1, amino acids 5631-5951 of SEQ ID NO:5); a DH (nucleotides 39635-40141 of SEQ ID 
NO:1, amino acids 5964-6132 of SEQ ID NO:5); an ER (nucleotides 41369-42256 of SEQ ID 
NO:1, amino acids 6542-6837 of SEQ ID NO:5); a KR (nucleotides 42314-43048 of SEQ ID 
NO:1, amino acids 6857-7101 of SEQ ID NO:5); and an ACP (nucleotides 43163-43378 of 
SEQ ID NO:1, amino acids 7140-7211 of SEQ ID NO:5). This module incorporates a 
propionate extender unit (C24 and C8-C7) and fully reduces the p-keto group at C9. 

epoD (nucleotides 43524-54920 of SEQ ID NO:1) codes for EPOS D (SEQ ID NO:6), a 
type I polyketide synthase consisting of 2 modules. The first module harbors a KS 
(nucleotides 43626-44885 of SEQ ID NO:1 , amino acids 35-454 of SEQ ID NO:6); a 
methylmalonyl CoA-specific AT (nucleotides 45204-46166 of SEQ ID NO:1, amino acids 
561-881 of SEQ ID NO:6); a KR (nucleotides 46950-47702 of SEQ ID NO:1, amino acids 
1 143-1393 of SEQ ID NO:6); and an ACP (nucleotides 4781 1-48032 of SEQ ID NO:1 , amino 
acids 1430-1503 of SEQ ID NO:6). This module incorporates a propionate extender unit 
(C23 and C6-C5) and reduces the p-keto group at C7 to a hydoxyl group. The second mo- 
dule harbors a KS (nucleotides 48087-49361 of SEQ ID NO:1, amino acids 1522-1946 of 
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SEQ ID NO: 6); a methylmalonyl CoA-specific AT (nucleotides 49680-50642 of SEQ ID 
NO:1, amino acids 2053-2373 of SEQ ID NO:6); a DH (nucleotides 50670-51176 of SEQ ID 
NO:1, amino acids 2383-2551 of SEQ ID NO:6); a methyltransf erase (MT, nucleotides 
51534-52657 of SEQ ID NO:1, amino acids 2671-3045 of SEQ ID NO:6); a KR (nucleotides 
53697-54431 of SEQ ID NO:1, amino acids 3392-3636 of SEQ ID NO:6); and an ACP 
(nucleotides 54540-54758 of SEQ ID NO:1 , amino acids 3673-3745 of SEQ ID NO:6). This 
module incorporates a propionate extender unit (C21 or C22 and C4-C3) and reduces the p- 
keto group at C5 to a hydoxyl group. This reduction is somewhat unexpected, since epo- 
thilones contain a keto group at C5. Discrepancies of this kind between the deduced reduc- 
tive capabilities of PKS modules and the redox state of the corresponding positions in the 
final polyketide products have been, however, reported in the literature (see, for example, 
Schwecke, et a!., Proc. Natl. Acad. Sci. USA 92: 7839-7843 (1995) and Schupp, et al., 
FEMS Microbiology Letters 159: 201-207 (1998)). An important feature of epothilones is the 
presence of gem-methyl side groups at C4 (C21 and C22). The second module of EPOS D 
is predicted to incorporate a propionate unit into the growing polyketide chain, providing one' 
methyl side chain at C4. This module also contains a methyltransf erase domain integrated 
into the PKS between the DH and the KR domains, in an arrangement similar to the one 
seen in the HMWP1 yersiniabactin synthase (Gehring, A.M., DeMoll, E., Fetherston, J.D., 
Mori, I., Mayhew, G.F., Blattner, F.R., Walsh, C.T., and Perry, R.D.: Iron acquisition in 
plague: modular logic in enzymatic biogenesis of yersiniabactin by Yersinia pestis. Chem. 
Biol 5, 573-586, 1998). This MT domain in EPOS D is proposed to be responsible for the 
incorporation of the second methyl side group (C21 or C22) at C4. 

epoE (nucleotides 54935-62254 of SEQ ID NO:1) codes for EPOS E (SEQ ID NO:7), a 
type I polyketide synthase consisting of one module, harboring a KS (nucleotides 55028- 
56284 of SEQ ID NO:1 , amino acids 32-450 of SEQ ID NO:7); a malonyl CoA-specific AT 
(nucleotides 56600-57565 of SEQ ID NO:1, amino acids 556-877 of SEQ ID NO:7); a DH 
(nucleotides 57593-58087 of SEQ ID NO:1 , amino acids 887-1051 of SEQ ID NO:7); a pro- 
bably nonfunctional ER (nucleotides 59366-60304 of SEQ ID NO:1, amino acids 1478-1790 
of SEQ ID NO:7); a KR (nucleotides 60362-61099 of SEQ ID NO:1, amino acids 1810-2055 
of SEQ ID NO:7); an ACP (nucleotides 61211-61426 of SEQ ID NO:1, amino acids 2093- 
2164 of SEQ ID NO:7); and a thioesterase (TE) (nucleotides 61427-62254 of SEQ ID NO:1, 
amino acids 2165-2439 of SEQ ID NO:7). The ER domain in this module harbors an active 
site motif with some highly unusual amino acid substitutions that probably render this domain 
inactive. The module incorporates an acetate extender unit (C2-C1), and reduces the p-keto 
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at C3 to an enoyl group. Epothilones contain a hydroxyl group at C3, so this reduction also 
appears to be excessive as discussed for the second module of EPOS D. The TE domain of 
EPOS E takes part in the release and cyclization of the grown polyketide chain via 
lactonization between the carboxyl group of C1 and the hydroxyl group of C15. 

Five ORFs are detected upstream of epoA in the sequenced region. The partially se- 
quenced ort\ has no homologues in the sequence databanks. The deduced protein product 
(Orf 2, SEQ ID NO:10) of orfZ (nucleotides 3171-1900 on the reverse complement strand of 
SEQ ID NO:1) shows strong similarities to hypothetical ORFs from Mycobacterium and 
Streptomyces coelicolor, and more distant similarities to carboxypeptidases and DD- 
peptidases of different bacteria. The deduced protein product of orfc (nucleotides 3415- 
5556 of SEQ ID NO:1), Orf 3 (SEQ ID NO:1 1), shows homologies to Na/H antiporters of 
different bacteria. Orf 3 might take part in the export of epothilones from the producer strain. 
orf A and orfS have no homologues in the sequence databanks. 

Eleven ORFs are found downstream of epoE in the sequenced region. epoF (nucle- 
otides 62369-63628 of SEQ ID NO:1) codes for EPOS F (SEQ ID NO:8), a deduced protein; 
with strong sequence similarities to cytochrome P450 oxygenases. EPOS F may take part in 
the adjustment of the redox state of the carbons C12, C5, and/or C3. The deduced protein 
product of orf) A (nucleotides 67334-68251 of SEQ ID NO:1), Orf 14 (SEQ ID NO:22) shows 
strong similarities to GL3293544, a hypothetic protein with no proposed function from 
Streptomyces coelicolor, and also to G 1:2654559, the human embrionic lung protein. It is 
also more distantly related to cation efflux system proteins like G 1:2623026 from Methano- 
bacterium thermoautotrophicum, so it might also take part in the export of epothilones from 
the producing cells. The remaining ORFs (or/6-orfl3 and o/f15) show no homologies to 
entries in the sequence databanks. 

Example 13: Recombinant Expression of Epothilone Biosynthesis Genes 

Epothilone synthase genes according to the present invention are expressed in hete- 
rologous organisms for the purposes of epothilone production at greater quantities than can 
be accomplished by fermentation of Sorangium cellulosum. A preferable host for hetero- 
logous expression is Streptomyces, e.g. Streptomyces coelicolor, which natively produces 
the polyketide actinorhodin. Techniques for recombinant PKS gene expression in this host 
are described in McDaniel et al, Science 262: 1546-1550 (1993) and Kao et al., Science 
265: 509-512 (1994). See also, Holmes et al, EMBO Journal 12(8): 3183-3191 (1993) and 



4-30582A 



-46- 

Bibb etaL, Gene 38: 215-226 (1985), as well as U.S. Patent Nos. 5,521,077, 5,672,491, and 
5,712,146, which are incorporated herein by reference. 

According to one method, the heterologous host strain is engineered to contain a 
chromosomal deletion of the actinorhodin {act) gene cluster. Expression plasmids contai- 
ning the epothilone synthase genes of the invention are constructed by transferring DNA 
from a temperature-sensitive donor plasmid to a recipient shuttle vector in E. coli (McDaniel 
etaL (1993) and Kao et al. (1994)), such that the synthase genes are built-up by homolo- 
gous recombination within the vector. Alternatively, the epothilone synthase gene cluster is 
introduced into the vector by restriction fragment ligation. Following selection, e.g. as 
described in Kao etal. (1994), DNA from the vector is introduced into the acf-minus 
Streptomyces coelicolor strain according to protocols set forth in Hopwood etaL, Genetic 
Manipulation of Streptomyces. A Laboratory Manual (John Innes Foundation, Norwich, 
United Kingdom, 1985), incorporated herein by reference. The recombinant Streptomyces 
strain is grown on R2YE medium (Hopwood etaL (1985)) and produces epothilones. 
Alternatively, the epothilone synthase genes according to the present invention are ex- ; 
pressed in other host organisms such as pseudomonads, Bacillus, yeast, insect cells and/or 
E. colL PKS and NRPS genes are preferably expressed in E. coli using the pT7-7 vector, 
which uses the T7 promoter. See, Tabor etaL, Proc. NatL Acad. ScL USA 82: 1074-1078 
(1985). In another embodiment, the expression vectors pKK223-3 and pKK223-2 are used 
to express PKS and NRPS genes in E. coli, either in transcriptional or translational fusion, 
behind the tac or trc promoter. Expression of PKS and NRPS genes in heterologous hosts, 
which do not naturally have the phosphopantetheinyl (P-pant) transferases needed for post- 
translational modification of PKS enzymes, requires the coexpression in the host of a P-pant 
transferase, as described by Kealey etaL, Proc. NatL Acad. ScL USA 95: 505-509 (1998). 

Example 14: Isolation of Epothilones from Producing Strains 

Examples of cultivation, fermentation, and extraction procedures for polyketide isola- 
tion, which are useful for extracting epothilones from both native and recombinant hosts ac- 
cording to the present invention, are given in WO 93/10121, incorporated herein by referen- 
ce, in Example 57 of U.S. Patent No. 5,639,949, in Gerth etaL, J. Antibiotics 49: 560-563 
(1996), and in Swiss patent application no. 396/98, filed February 19, 1998, and U.S. patent 
application no. 09/248,910 (that discloses also preferred mutant strains of Sorangium 
celluiosum), both of which are incorporated herein by reference. The following are pro- 
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cedures that are useful for isolating epothilones from cultured Sorangium cellulosum strains, 
e.g., So ce90, and may also be used for the isolation of epothilone from recombinant hosts. 

A: Cultivation of epothilone-producinq strains: 

Strain: Sorangium cellulosum Soce-90 or a recombinant host strain 

according to the present invention. 

Preservation of the strain: In liquid N 2 . 

Media: Precultures and intermediate cultures: G52 

Main culture: 1B12 

G52 Medium: 

yeast extract, low in salt (BioSpringer, Maison Alfort, France) 2 g/l \ 

MgS0 4 (7 H 2 0) 1 g/l 

CaCI 2 (2 H 2 0) 1 g/l 
soya meal defatted Soyamine 50T (Lucas Meyer, Hamburg, 

Germany) 2 g/l 
potato starch Noredux A-150 (Blattmann, Waedenswil, 

Switzerland) 8 g/l 

glucose anhydrous 2 g/l 

EDTA-Fe(lll)-Na salt (8 g/l) 1 ml/l 
pH 7.4, corrected with KOH 
Sterilisation: 20 mins. 120 °C 

1B12 Medium: 

potato starch Noredux A-150 (Blattmann, Waedenswil, 

Switzerland) 20 g/l 
soya meal defatted Soyamine SOT (Lucas Meyer, Hamburg, 

Germany) 11 g/l 

EDTA-Fe(lll)-Na salt 8 mg/l 
pH 7.8, corrected with KOH 
Sterilisation: 20 mins. 120 °C 
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Addition of cvclodextrins and cvclodextrin derivatives: 

Cyclodextrins (Fluka, Buchs, Switzerland, or Wacker Chemie, 
Munich, Germany) in different concentrations are sterilised separately 
and added to the 1 B1 2 medium prior to seeding. 

Cultivation : 1 ml of the suspension of Sorangium cellulosum Soce-90 from a liquid N 2 am- 
poule is transferred to 10 ml of G52 medium (in a 50 ml Erlenmeyer flask) and incubated for 
3 days at 180 rpm in an agitator at 30°C, 25 mm displacement. 5 ml of this culture is added 
to 45 ml of G52 medium (in a 200 ml Erlenmeyer flask) and incubated for 3 days at 180 rpm 
in an agitator at 30°C, 25 mm displacement. 50 ml of this culture is then added to 450 ml of 
G52 medium (in a 2 litre Erlenmeyer flask) and incubated for 3 days at 180 rpm in an agi- 
tator at 30°C, 50 mm displacement. 

Maintenance culture: The culture is overseeded every 3-4 days, by adding 50 ml of culture to 
450 ml of G52 medium (in a 2 litre Erlenmeyer flask). AH experiments and fermentations are 
carried out by starting with this maintenance culture. 

Tests in a flask: 

(I) Preculture in an agitating flask: 

Starting with the 500 ml of maintenance culture, 1 x 450 ml of G52 medium are seeded with 
50 ml of the maintenance culture and incubated for 4 days at 180 rpm in an agitator at 30°C, 
50 mm displacement, 
(ii) Main culture in the agitating flask: 

40 ml of 1B12 medium plus 5 g/l 4-morpholine-propane-sulfonic acid (= MOPS) powder (in a 
200 ml Erlenmeyer flask) are mixed with 5 ml of a 1 0x concentrated cyclodextrin solution, 
seeded with 10 ml of preculture and incubated for 5 days at 180 rpm in an agitator at 30°C, 
50 mm displacement. 

Fermentation: Fermentations are carried out on a scale of 10 litres, 100 litres and 500 litres. 
20 litre and 100 litre fermentations serve as an intermediate culture step. Whereas the pre- 
cultures and intermediate cultures are seeded as the maintenance culture 10% (v/v), the 
main cultures are seeded with 20% (v/v) of the intermediate culture. In contrast to the 
agitating cultures, ingredients of the fermentation media are calculated on the final culture 
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volume including the inoculum. If, for example, 18 litres of medium + 2 litres of inoculum are 
combined, then substances for 20 litres are weighed in, but are only mixed with 18 litres. 

Preculture in an agitating flask: 

Starting with the 500 ml maintenance culture, 4 x 450 ml of G52 medium (in a 2 litre Erlen- 
meyer flask) are each seeded with 50 ml thereof, and incubated for 4 days at 180 rpm in an 
agitator at 30°C, 50 mm displacement. 

Intermediate culture. 20 litres or 100 litres: 

20 litres: 18 litres of G52 medium in a fermenter having a total volume of 30 litres are 
seeded with 2 litres of preculture. Cultivation lasts for 3-4 days, and the conditions are: 30°C, 
250 rpm, 0.5 litres of air per litre liquid per min, 0.5 bars excess pressure, no pH control. 
100 litres: 90 litres of G52 medium in a fermenter having a total volume of 150 litres are 
seeded with 10 litres of the 20 litre intermediate culture. Cultivation lasts for 3-4 days, and 
the conditions are: 30°C, 150 rpm, 0.5 litres of air per litre liquid per min, 0.5 bars excess ; 
pressure, no pH control. 

Main culture. 10 litres. 100 litres or 500 litres: 

10 litres: The media substances for 10 litres of 1 B12 medium are sterilised in 7 litres of 
water, then 1 litre of a sterile 10% 2-(hydroxypropyl) -p-cyclodextrin solution are added, and 
seeded with 2 litres of a 20 litre intermediate culture. The duration of the main culture is 6- 
7 days, and the conditions are: 30°C, 250 rpm, 0.5 litres of air per litre of liquid per min, 
0.5 bars excess pressure, pH control with H 2 SCVKOH to pH 7.6 +/- 0.5 (i.e. no control 
between pH 7.1 and 8.1). 

100 litres: The media substances for 100 litres of 1B12 medium are sterilised in 70 litres of 
water, then 10 litres of a sterile 10% 2-(hydroxypropyl) -p-cyclodextrin solution are added, 
and seeded with 20 litres of a 20 litre intermediate culture. The duration of the main culture is 
6-7 days, and the conditions are: 30°C, 200 rpm, 0.5 litres air per litre liquid per min., 
0.5 bars excess pressure, pH control with H2SO4/KOH to pH 7.6 +/- 0.5. The chain of 
seeding for a 100 litre fermentation is shown schematically as follows: 
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intermediate 
culture (e.g. 20 I) 
G52 medium 

[20%~ 



main culture 
(e.g. 100 I) 
medium + HP-p-CD 

500 litres: The media substances for 500 litres of 1 B12 medium are sterilised in 350 litres of 
water, then 50 litres of a sterile 10% 2-(hydroxypropyl) -p-cyclodextrin solution are added, 
and seeded with 100 litres of a 100 litre intermediate culture. The duration of the main 
culture is 6-7 days, and the conditions are: 30°C, 120 rpm, 0.5 litres air per litre liquid per ; 
min., 0.5 bars excess pressure, pH control with H2SO4/KOH to pH 7.6 +/- 0.5. 

Product analysis: 
Preparation of the sample: 

50 ml samples are mixed with 2 ml of polystyrene resin Amberlite XAD16 (Rohm + Haas, 
Frankfurt, Germany) and shaken at 180 rpm for one hour at 30°C. The resin is subsequently 
filtered using a 150 urn nylon sieve, washed with a little water and then added together with 
the filter to a 15 ml Nunc tube. 
Elution of the product from the resin: 

10 ml of isopropanol (>99%) are added to the tube with the filter and the resin. Afterwards, 
the sealed tube is shaken for 30 minutes at room temperature on a Rota-Mixer (Labinco BV, 
Netherlands). Then, 2 ml of the liquid are centrifuged off and the supernatant is added using 
a pipette to HPLC tubes. 
HPLC analysis: 

Column: Waters-Symetry C1 8, 1 00 x 4 mm, 3.5 urn 

WAT066220 + preliminary column 3.9 x 20 mm 
WAT054225 

Solvents: A: 0.02 % phosphoric acid 

B: Acetonitrile (HPLC-Quality) 



maintenance culture (500ml) 
G52 medium 



10% 



10% 



precultures 
(4 x 500 ml) 
G52 medium 



maintenance culture 
(500 ml) G52 medium 



10% 
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Gradient: 41 % B from 0 to 7 min. 

100% B from 7.2 to 7.8 min. 

41 % B from 8 to 1 2 min. 
Oven temp.: 30°C 
Detection: 250 nm, UV-DAD detection 
Injection vol.: 10ul 

Retention time: Epo A: 4.30 min Epo B: 5.38 min 



B: Effect of the addition of cvclodextrin and cyclodextrin derivatives to the epothilone 
concentrations attained. 

Cyclodextrins are cyclic (a-1 ,4)-linked oligosaccharides of a-D-glucopyranose with a 
relatively hydrophobic central cavity and a hydrophilic external surface area. 

The following are distinguished in particular (the figures in parenthesis give the number 
of glucose units per molecule): a-cyclodextrin (6), p-cyclodextrin (7), y- cyclodextrin (8), ; 

8- cyclodextrin (9), e- cyclodextrin (10), ^-cyclodextrin (11), r|-cyclodextrin (12), and 

9- cyclodextrin (13). Especially preferred are 8-cyclodextrin and in particular a-cyclodextrin, 
p-cyclodextrin or y-cyclodextrin, or mixtures thereof. 

Cyclodextrin derivatives are primarily derivatives of the above-mentioned cyclodextrins, 
especially of a-cyclodextrin, p-cyclodextrin or y-cyclodextrin, primarily those in which one or 
more up to all of the hydroxy groups (3 per glucose radical) are etherified or esterified. 
Ethers are primarily alkyl ethers, especially lower alkyl, such as methyl or ethyl ether, also 
propyl or butyl ether; the aryl-hydroxyalkyl ethers, such as phenyl-hydroxy-lower-alkyl, 
especially phenyl-hydroxyethyl ether; the hydroxyalkyl ethers, in particular hydroxy-lower- 
alkyl ethers, especially 2-hydroxyethyl, hydroxypropyl such as 2-hydroxypropyl or hydroxy- 
butyl such as 2-hydroxybutyl ether; the carboxyalkyl ethers, in particular carboxy-lower-alkyl 
ethers, especially carboxymethyl or carboxyethyl ether; derivatised carboxyalkyl ethers, in 
particular derivatised carboxy-lower-alkyl ether in which the derivatised carboxy is etherified 
or amidated carboxy (primarily aminocarbonyl, mono- or di-lower-alkyl-aminocarbonyl, mor- 
pholino-, piperidino-, pyrrolidino- or piperazino-carbonyl, or alkyloxycarbonyl), in particular 
lower alkoxycarbonyl-lower-alkyl ether, for example methyloxycarbonylpropyl ether or 
ethyloxycarbonylpropyl ether; the sulfoalkyl ethers, in particular sulfo-lower-alkyl ethers, 
especially sulfobutyl ether; cyclodextrins in which one or more OH groups are etherified with 
a radical of formula 
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-0-[a!k-0-]n-H 

wherein alk is alky!, especially lower alkyl, and n is a whole number from 2 to 12, especially 2 
to 5, in particular 2 or 3; cyclodextrins in which one or more OH groups are etherified with a 
radical of formula 

R 1 

1 /° 
(Alk-O)- Alk \ 

X Y 

wherein R' is hydrogen, hydroxy, -0-(alk-0) z -H, -0-(alk(-R)-0-) p -H or 
-0-(alk(-R)-0-) q -aIk-CO-Y; alk in all cases is alkyl, especially lower alkyl; m, n, p, q and z are 
a whole number from 1 to 12, preferably 1 to 5, in particular 1 to 3; and Y is OR n or NR 2 R 3 , 
wherein* Ri, R 2 and R 3 independently of one another, are hydrogen or lower alkyl, or R 2 and 
R 3 combined together with the linking nitrogen signify morpholino, piperidino, pyrrolidine or 
piperazino; or branched cyclodextrins, in which etherifications or acetals with other sugar 
molecules are present, especially glucosyh diglucosyl- (G 2 -p -cyclodextrin), maltosyl- or di- 
maltosyl-cyclodextrin, or N-acetylglucosaminyl-, glucosaminyl-, N-acetylgalactosaminyl- or 
galactosaminyl-cyclodextrin. 

Esters are primarily alkanoyl esters, in particular lower alkanoyl esters, such as acetyl 
esters of cyclodextrins. 

It is also possible to have cyclodextrins in which two or more different said ether and 
ester groups are present at the same time. 

Mixtures of two or more of the said cyclodextrins and/or cyclodextrin derivatives may 
also exist. 

Preference is given in particular to a-, p- or y-cyclodextrins or the lower alkyl ethers 
thereof, such as methyl-p-cyclodextrin or in particular 2,6-di-O-methyl-p-cyclodextrin, or in 
particular the hydroxy lower alkyl ethers thereof, such as 2-hydroxypropyl-oc- 2-hydroxy- 
propyl-p- or 2-hydroxypropyl-y-cyclodextrin. 

The cyclodextrins or cyclodextrin derivatives are added to the culture medium 
preferably in a. concentration of 0.02 to 10, preferably 0.05 to 5, especially 0.1 to 4, for 
example 0.1 to 2 percent by weight (w/v). 

Cyclodextrins or cyclodextrin derivatives are known or may be produced by known 
processes (see for example US 3,459,731; US 4,383,992; US 4,535,152; US 4,659,696; EP 
0 094 157; EP 0 149 197; EP 0 197 571 ; EP 0 300 526; EP 0 320 032; EP 0 499 322; EP 0 
503 710; EP 0 818 469; WO 90/12035; WO 91/11200; WO 93/19061; WO 95/08993; WO 
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96/14090; GB 2,189,245; DE 3,118,218; DE 3,317,064 and the references mentioned the- 
rein, which also refer to the synthesis of cyclodextrins or cyclodextrin derivatives, or also: T. 
Loftsson and M.E. Brewster (1996): Pharmaceutical Applications of Cyclodextrins: Drug 
Solubilization and Stabilisation: Journal of Pharmaceutical Science 85 (10):1017-1025; R.A. 
Rajewski and V.J. Stella(1996): Pharmaceutical Applications of Cyclodextrins: In Vivo Drug 
Delivery: Journal of Pharmaceutical Science 85 (11): 1142-1169). 

Ail the cyclodextrin derivatives tested here are obtainable from the company Fluka, 
Buchs, CH. The tests are carried out in 200 ml agitating flasks with 50 ml culture volume. As 
controls, flasks with adsorber resin Amberlite XAD-16 (Rohm & Haas, Frankfurt, Germany) 
and without any adsorber addition are used. After incubation for 5 days, the following 
epothilone titres can be determined by HPLC: 



Table 2: 



Addition 


order 


Cone 


Epo A [mg/l] 


Epo B [mg/l] 




No. 


P/oW/V] 1 






Amberlite XAD-16 (v/v) 




2.0 (%v/v) 


9.2 


3.8 I 


I 2-hydroxypropyI-p-cyclodextrin 


56332 


0.1 


2.7 


1.7 


2-hydroxypropyl-p-cyclodextrin 


cc 


0.5 


4.7 


3.3 


2-hydroxypropyl-p-cyclodextrin 


u 


1.0 


4.7 


3.4 


2-hydroxypropyl-p-cyclodextrin 


cc 


2.0 


4.7 


4.1 


2-hydroxypropyl-p-cyclodextrin 


cc 


5.0 


1.7 


0.5 


\ 2-hvdroxypropyl- a-cyclodextrin 


56330 


0.5 


1.2 


1.2 


2-hydroxypropyl- a-cyclodextrin 


CC 


1.0 


1.2 


1.2 


2-hydroxypropyl- a-cyclodextrin 


cc 


5.0 


2.5 


2.3 | 


I p-cyclodextrin 


28707 


0.1 


1.6 


1.3 


p-cyclodextrin 


tt 


0.5 


3.6 


2.5 | 


p-cyclodextrin 


cc 


1.0 


4.8 


3.7 | 


B-cyclodextrin 


cc 


2.0 


4.8 


2.9 


p-cyclodextrin 


ft 


5.0 


1.1 


0.4 


methyl-p-cyclodextrin 


66292 


0.5 


0.8 


<0.3 


| methyl-p-cyclodextrin 


cc 


1.0 


<0.3 


<0.3 


methyl-P-cyclodextrin 


cc 


2.0 


<0.3 


<0.3 


2,6 di-o-methyl-p-cyclodextrin 


39915 


1.0 


<0.3 


<0.3 


i 2-hydroxypropyl-Y-cyclodextrin 


56334 


0.1 


0.3 


<0.3 


2-hydroxypropyl-Y-cyclodextrin 


cc 


0.5 


0.9 


0.8 


2-hydroxypropyl-Y-cyclodextrin 


cc 


1.0 


1.1 


0.7 


2-hydroxypropyl-Y-cyclodextrin 


Cf 


2.0 


2.6 


0.7 


2-hydroxypropyl-Y-cyclodextrin 


cc 


5.0 


5.0 


1.1 


no addition 






0.5 


0.5 



Apart from Amberlite (%v/v), all percentages are by weight (%w/v). 



4-30582A 



-54- 

Few of the cyclodextrins tested (2,6-di-o-methyl-p-cyclodextrin, methyl-p-cyclodextrin) 
display no effect or a negative effect on epothilone production at the concentrations used. 

1- 2% 2-hydroxy-propyl-p-cyclodextrin and p-cyclodextrin increase epothilone production in 
the examples by 6 to 8 times compared with production using no cyclodextrins. 

C: 10 litre fermentation with 1% 2-(hvdroxypropvn-B-cvclodextrin): 

Fermentation is carried out in a 15 litre glass fermenter. The medium contains 10 g/l of 

2- (hydroxypropyl)-p-cyclodextrin from Wacker Chemie, Munich, DE. Fermentation progress 
is illustrated in Table 3. Fermentation is ended after 6 days and working up takes place. 

Table 3 : Progress of a 10 litre fermentation 



duration of culture [d] 


Epothilone A [mg/l] 


Epothilone B [mg/l] 


0 


0 


0 


1 


0 


0 


2 


0.5 


0.3 


3 


1.8 


2.5 


4 


3.0 


5.1 


5 


3.7 


5.9 


6 


3.6 


5.7 



D: 100 litre fermentation with 1% 2-(hvdroxypropvn-B-cvclodextrin): 

Fermentation is carried out in a 150 litre fermenter. The medium contains 10 g/l of 2- 
(Hydroxypropyl)-p-cyclodextrin. The progress of fermentation is illustrated in Jable 4. The 
fermentation is harvested after 7 days and worked up. 
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Table 4: Progress of a 100 litre fermentation 



duration of culture [d] 


Epothilone A [mg/l] 


Epothilone B [mg/l] 


0 


0 


0 


1 


0 


0 


2 


0.3 


0 


3 


0.9 


1.1 


4 


1.5 


2.3 


5 


1.6 


3.3 


6 


1.8 


3.7 


7 


1.8 


3.5 



E: 500 litre fermentation with 1% 2-(hvdroxypropvO-B-cvclodextrin): 

Fermentation is carried out in a 750 litre fermenter. The medium contains 10 g/l of 2- 
(Hydroxypropyl)-p-cyclodextrin. The progress of fermentation is illustrated in Table 5. The ' 
fermentation is harvested after 7 days and worked up. 

Table 5: Progress of a 500 litre fermentation 



duration of culture [d] 


Epothilone A [mg/l] 


Epothilone B [mg/l] 


0 


0 


0 


1 


0 


0 


2 


0 


0 


3 


0.6 


0.6 


4 


1.7 


2.2 


5 


3.1 


4.5 


6 


3.1 


5.1 



F: Comparison example 10 litre fermentation without adding an adsorber: 

Fermentation is carried out in a 15 litre glass fermenter. The medium does not contain 
any cyclodextrin or other adsorber. The progress of fermentation is illustrated in Table 6. The 
fermentation is not harvested and worked up. 
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Table 6: Progress of a 1 0 litre fermentation without adsorber. 



duration of culture [d] 


Epothilone A [mg/l] 


Epothilone B [mg/l] 


0 


0 


0 


1 


0 


0 


2 


0 


0 


3 


0 


0 


4 


0.7 


0.7 


5 


0.7 


1.0 


6 


0.8 


1.3 



G: Working up of the epothilones: Isolation from a 500 litre main culture: 

The volume of harvest from the 500 litre main culture of example 2D is 450 litres and is 
separated using a Westfalia clarifying separator Type SA-20-06 (rpm = 6500) into the liquid 
phase (centrifugate + rinsing water = 650 litres) and solid phase (cells = ca. 15 kg). The 
main part of the epothilones are found in the centrifugate, The centrifuged cell pulp contains 
< 15% of the determined epothilone portion and is not further processed. The 650 litre 
centrifugate is then placed in a 4000 litre stirring vessel, mixed with 1 0 litres of Amberlite 
XAD-16 (centrifugate:resin volume = 65:1) and stirred. After a period of contact of ca. 2 
hours, the resin is centrifuged away in a Heine overflow centrifuge (basket content 40 litres; 
rpm = 2800). The resin is discharged from the centrifuge and washed with 10-15 litres of 
deionised water. Desorption is effected by stirring the resin twice, each time in portions with 
30 litres of isopropanol in 30 litre glass stirring vessels for 30 minutes. Separation of the 
isopropanol phase from the resin takes place using a suction filter. The isopropanol is then 
removed from the combined isopropanol phases by adding 15-20 litres of wsfter in a vacuum- 
operated circulating evaporator (Schmid-Verdampfer) and the resulting water phase of ca. 
10 litres is extracted 3x each time with 10 litres of ethyl acetate. Extraction is effected in 
30 litre glass stirring vessels. The ethyl acetate extract is concentrated to 3-5 litres in a 
vacuum-operated circulating evaporator (Schmid-Verdampfer) and afterwards concentrated 
to dryness in a rotary evaporator (Buchi type) under vacuum. The result is an ethyl acetate 
extract of 50.2 g. The ethyl acetate extract is dissolved in 500 ml of methanol, the insoluble 
portions filtered off using a folded filter, and the solution added to a 1 0 kg Sephadex LH 20 
column (Pharmacia, Uppsala, Sweden) (column diameter 20 cm, filling level ca. 1.2 m). 
Elution is effected with methanol as eluant. Epothilone A and B is present predominantly in 
fractions 21-23 (at a fraction size of 1 litre). These fractions are concentrated to dryness in a 
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vacuum on a rotary evaporator (total weight 9.0 g). These Sephadex peak fractions (9.0 g) 
are thereafter dissolved in 92 ml of acetonitrile:-water:-methylene chloride = 50:40:2, the 
solution filtered through a folded filter and added to a RP column (equipment Prepbar 200, 
Merck; 2. 0 kg LiChrospher RP-18 Merck, grain size 12|i,m, column diameter 10 cm, filling 
level 42 cm; Merck, Darmstadt, Germany). Elution is effected with acetonitrile:water = 3:7 
(flow rate = 500 ml/min.; retention time of epothilone A = ca. 51-59 mins.; retention time of 
epothilone B = ca. 60-69 mins.). Fractionation is monitored with a UV detector at 250 nm. 
The fractions are concentrated to dryness under vacuum on a Buchi-Rotavapor rotary 
evaporator. The weight of the epothilone A peak fraction is 700 mg, and according to HPLC 
(external standard) it has a content of 75.1%. That of the epothilone B peak fraction is 
1980 mg, and the content according to HPLC (external standard) is 86.6%. Finally, the 
epothilone A fraction (700 mg) is crystallised from 5 ml of ethyl acetate:toluene = 2:3, and 
yields 170 mg of epothilone A pure crystallisate [content according to HLPC (% of area) = 
94.3%]. Crystallisation of the epothilone B fraction (1980 mg) is effected from 18 ml of 
methanol and yields 1440 mg of epothilone B pure crystallisate [content according to HPLCi 
(% of area) = 99.2%]. m.p. (Epothilone B): e.g. 124-125 °C; 1 H-NMR data for Epothilone B: 
500 MHz-NMR, solvent: DMSO-d6. Chemical displacement 6 in ppm relative to TMS. s = 
singlet; d = doublet; m = multiplet 
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8 (Multiplicity) Integral (number of H) 



7.34 


(s) 


■ 


6.50 


(s) 


1 


5.28 


(d) 


1 


5.08 


(d) 


1 


4.46 


(d) 


1 


4.08 


(m) 


1 


3.47 


(m) 


1 


li 3.11 


(m) 


1 


2.83 


(dd) 


1 


2.64 


(s) 


3 


2.36 


(m) 


2 


It 2.09 


(s) 


3 


2.04 


(m) 


1 


1.83 


(m) 


1 


1.61 


(m) 


1 


1.47-1.24 (m) 


4 


1.18 


(s) 


6 


1.13 


(m) 


2 


1.06 


(d) 


3 



0.89 (d + s, overlapping) 6 
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Example 15: Medical Uses of Recombinantly Produced Epothiiones 

Pharmaceutical preparations or compositions comprising epothiiones are used for 
example in the treatment of cancerous diseases, such as various human solid tumors. Such 
anticancer formulations comprise, for example, an active amount of an epothilone together 
with one or more organic or inorganic, liquid or solid, pharmaceutical^ suitable carrier 
materials. Such formulations are delivered, for example, enteraily, nasally, rectally, orally, or 
parenterally, particularly intramuscularly or intravenously. The dosage of the active 
ingredient is dependent upon the weight, age, and physical and pharmacokinetical condition 
of the patient and is further dependent upon the method of delivery. Because epothiiones 
mimic the biological effects of taxol, epothiiones may be substituted for taxol in compositions 
and methods utilizing taxol in the treatment of cancer. See, for example, U.S. Patent Nos. 
5,496,804, 5,565,478, and 5,641 ,803, all of which are incorporated herein by reference. 

For example, for treatments, epothilone B is supplied in individual 2 ml glass vials 
formulated as 1 mg/1 ml of clear, colorless intravenous concentrate. The substance is ; 
formulated in polyethylene glycol 300 (PEG 300) and diluted with 50 or 100 ml 0.9% Sodium 
Chloride Injection, USP, to achieve the desired final concentration of the drug for infusion. It 
is administered as a single 30-minute intravenous infusion every 21 days (treatment three- 
weekly) for six cycles, or as a single 30-minute intravenous infusion every 7 days (weekly 
treatment). 

Preferably, for weekly treatment, the dose is between about 0.1 and about 6, 
preferably about 0.1 and about 5 mg/m 2 , more preferably about 0.1 and about 3 mg/m 2 , even 
more preferably 0.1 and 1 .7 mg/m 2 , most preferably about 0.3 and about 1 mg/m 2 ; for 
three-weekly treatment (treatment every three weeks or every third week) the dose is 
between about 0.3 and about 18 mg/m 2 , preferably about 0.3 and about 15 mg/m 2 , more 
preferably about 0.3 and about 12 mg/m 2 , even more preferably about 0.3 and about 7.5 
mg/m 2 , still more preferably about 0.3 and about 5 mg/m 2 , most preferably about 1 .0 and 
about 3.0 mg/m 2 . This dose is preferably administered to the human by intravenous (i.v.) 
administration during 2 to 180 min, preferably 2 to 120 min, more preferably during about 5 
to about 30 min, most preferably during about 10 to about 30 min, e.g. during about 30 min. 
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While the present invention has been described with reference to specific 
embodiments thereof, it will be appreciated that numerous variations, modifications, and 
embodiments are possible, and accordingly, all such variations, modifications and 
embodiments are to be regarded as being within the spirit and scope of the present 
invention. 



